With the new product versions Business Automation Workflow on Containers (BAW on containers) 18.104.22.168 and IBM Cloud Pak for Automation (ICP4A) 20.0.3 the use of PostgreSQL databases is now also supported. In this blog we will do a short overview on PostgreSQL basics which might come in handy during the setup process with any of the workflow related products. Especially in the Public Cloud scenarios a PostgreSQL database can be an attractive option. One always needs to keep in mind that network latency between the database and the Workflow product can have a significant impact on throughput, in some cases it can also make the handling impossible. Therefore the database best should have network latency in the millisecond range when connecting to the Workflow product (we are testing with network latencies around 0.5 ms).
In this first part of the blog we will focus on a non-container PostgreSQL server installation. The setup with a container based PostgreSQL server in the same namespace as the Workflow product will be covered in a second part.
1.1 Howto install a PostgreSQL database
If you are new to PostgreSQL there exist rather simple steps to do a basic installation on the PostgreSQL homepage. The following examples will be based on an installation on Redhat Enterprise Linux Server 8 (RHEL8), but should in general also apply to other platforms. Very few commands need to be executed to get a PostgreSQL server up and running, which is automatically started after the complete machine was restarted. We will not go in the details for setting up the PostgreSQL server itself as there are a lot of different options and this is already covered by PostgreSQL itself, although we later might give some tips on further tuning options.
One thing you should think before the installation already is what hardware do I need? Of course this will depend on the workload of the future system. The PostgreSQL server itself is not resource hungry to get up and running, but this can change with the workload. In the IBM Knowledge Center we have some very basic resource requirements listed. For production loads you definitely want to increase these values. Thus for a production load database server an extreme fast storage for transaction logs should be attempted (earlier one stated SSD, I currently would state NVMe or similar speed). With our current performance tests we can see bottlenecks when simply relying on SSD storage and I can also see similar patterns with customers. The fio tool can be used to get some idea on the disk performance available.
Another component is memory. As the Workflow products have some central tables which can suffer under high load scenarios I normally tend to pin some of the tables in the memory for better performance. This requires however that sufficient memory is available, depending on the involved tables this can consume more or less memory. In a later section we will go into further details for which tables this procedure can make sense. And PostgreSQL is a nice platform to easily dig deeper on the resource consumption and also long running SQL statements.
1.2 PostgreSQL Server Extensions
Besides PostgreSQL there exist a number of extensions which could bring additional benefits. The most prominent extension might be the pg_stat_statements which we will cover later. For the beginning we should be set already. In the further sections we might see some more extensions though and we will apply them where needed.
2.1 Administration tools
There exist a number of tools to interact with the database system. We will have a short overview of a small selection, there for sure will be more tools, the selection is arbitrary based on observations made in the past and not complete. The most basic is the psql command line interface, which has a number of nice options we might partly see in the further progress.
To give a short overview on graphical user interfaces please have a look to the following tools table:
|Name / Link
|| Retiring Standalone client, recommendation to move to pgAdmin4
|| The successor of pgAdmin3, which now uses a webserver to provide a webinterface. Below will be a screenshot shared for reference
|| PHP based client
|| PHP based client
The following shows a screenshot of pgAdmin4. One can see a number of different tabs which can be helpful to monitor the system. This is just a pointer what is available there. The concrete usage of all options can not be covered as part of this blog, where applicable functions will be used to show concrete examples.
2.2 Restarting/Reloading the PostgreSQL server
While setting up the system or doing performance tuning one wants to get the new parameters available immediately. One needs to differ between properties which are effective immediately after a reload and those which will only be effective after a restart. To determine what is the case here, there exist two methods which can extend each other:
- Method1: You can run this SQL statement
SELECT name, context FROM pg_settings WHERE name ilike '%log%'; where you replace the 'log' search with your parameter naming. If the result shows a sighup a reload is sufficient, for postmaster a restart is required. There are other results like superuser, therefore method 2 might bring some more clear insights.
- Method2: Change all required parameters and execute a reload. Afterwards you can run the following statement to see any parameter requiring a restart:
SELECT name FROM pg_settings WHERE pending_restart IS true;
One can state that PostgreSQL performs a restart rather quickly. I however would recommend to check your baw container logs after such an operation as sometimes things might hook up when the database server gets absent for some seconds.
2.3 Backup and Restore
Backup and restore is a basic activity for production usage and you most likely have a professional DBA to guard your business data. Thus the commands provided here are for your convenience in testing environments if you want to be able to recover quickly (there for sure exist more advanced features as well):
- Backup a single database: pg_dump <dbname> > dbname.bak
- Backup all databases: pg_dumpall > pg_backup.bak
- Restore a single database: createdb <dbname>
psql -h localhost –U pgsql -d <dbname> < dbname.bak
- Restore all databases: psql -h localhost –U pgsql < pg_backup.bak
3.1 Central configuration files
There exist 3 main configuration files one should be aware of:
- postgresql.conf - Central file for database configuration settings. Default path on RHEL Linux: /var/lib/pgsql/13/data
- pg_hba.conf - Client authentication, e.g. required during setup to allow access to the system from the outside
- pg_ident.conf - Mapping of external authentication to database usernames
3.2 OS specific required settings
This is a reminder that some of the configuration settings of PostgreSQL will depend on the underlying operating system configuration. We will later list some configuration parameters of potential interest here. Be aware that there exist also advanced tuning tools, e.g. like the tuned you can find on RHEL Linux.
3.3 Parameters of postgresql.conf which can be of interest for production loads
This section will describe some configuration parameters which can have an impact on your system (additional parameters might be added over time). Be aware that required configuration settings will depend on workloads being executed, thus this collection can give you only some ideas what exists and what could be worth to be adjusted.
|Connections (Default 100)
|| This parameter might need to be adjusted depending on the attempted workload. Also pooling might be a topic (see later)
|wal_buffers (Default 1/32 of shared_buffers)
|| Default seems to be capped at 16MB, recommendations for 32MB exist
|bgwriter_lru_maxpages (Default 100)
|| This limits the number of pages being written by bgwriter each turn, to be large enough the value might need to be raised
|bgwriter_lru_multiplier (Default 2)
|| Slight increase to values of 3 or 4 can also make sense for heavier loaded system
|| For magnetic disks this will scale with the number of spindles, for SSDs this should be in the region of hundreds
|checkpoint_completion_target (Default 0.5)
|| To reduce the IO impact it can make sense to increase the value to 0.8 or 0.9 (closer to checkpoint timeout)
|autovacuum_max_workers (Default 3)
|| For larger database systems it can be beneficial to increase this value in the region of around 10 depending on load
|autovacuum_naptime (Default 1 min)
|| On busy systems or systems with a large number of databases this most likely needs to be increased
|autovacuum_vacuum_cost_delay (Default 20 ms)
|| Rather conservative setting, depending on load should be lowered to allow VACUUM to stay in sync with the system
|| Recommendation increase at least to 200
What this actually means is, that for a well performing system measurements need to be made with a characteristic production workload to determine the right settings of the parameters. Especially the background logging can generate significant IO load when parameters do not match the required settings.
4. Debugging performance issues
4.1 The pg_stat_statements extension
In this section some pre-configuration is required. PostgreSQL already ships some extensions which can be enabled. For performing performance related debugging the pg_stat_statements extension is highly recommended to be applied:
- In your postgresql.conf file find the shared_preload_libraries entry and add the pg_stat_statement module:
shared_preload_libraries = 'pg_stat_statements'
Hint: Make sure that there is only one postgresql.conf file in your configuration directory, else your changes might not be effective. An exception like the following can be seen then:
"ERROR: pg_stat_statements must be loaded via shared_preload_libraries".
- After the change to the postgresql.conf file a server restart is required
- The extension can be enabled on the database by
CREATE EXTENSION pg_stat_statements;
Now we can have a look at the actual debugging. The following two SQL statements can come in very handy to better understand potential performance problems on the database (slightly modified from the following presentation at PostgresConf 2019: https://www.youtube.com/watch?v=5M2FFbVeLSs. For friends of sarcastic humor highly recommended.).
And here is our first SQL statement:
SELECT substring(query, 1, 60) AS short_query, round(total_exec_time::numeric, 2) AS total_exec_time, calls, round(mean_exec_time::numeric, 2) AS mean,
round((100 * total_exec_time / sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_overall FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;
The output of the command will look like the following screenshot. Main focus should be laid on the most resource consuming parts. This can be a few very expensive queries as well as millions of quickly executing queries which will not show up on their own for their execution time. Thus we can see the full picture here for the database. The statement will list the most expensive SQL statements with resource utilization as well as number of executions (output can be adjusted by skipping the substring command to see the full SQL statement):
After you have seen your first output of the statement you might think, how can I reset the counters to only see the latest executed SQL statements? This can be done with the following command (for more granular resets check the PostgreSQL documentation):
And here is our second SQL statement (also can be found from the PostgresConf 2019 talk):
SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan AS avg FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC ;
The statement will list scan activities and can be a good indicator of potentially missing indexes. A sample output will look like this:
4.2 The log_min_duration_statement
Thus from the pg_stat_statements output we already have a concrete idea what statements are problematic for the system. But there is a second option which can be used standalone or in combination with pg_stat_statements. PostgreSQL allows to enable additional logging and we will enable this and than have a look how this can bring us more insights for the problematic statements:
- Search inside your postgresql.conf configuration file the log_min_duration_statement property and change the value to a millisecond number (we use here 300 ms as an aggressive example, depending on your trouble scenario you might start with larger values or use the sampling option log_min_duration_sample - not described here)
- Now the question is restart or reload after the change? A reload should be sufficient in this case as can be checked by the methods described in section 2.2.
And then we can recreate our load pattern and check under the data/log directory what information has been logged. Besides the statement of "INSERT into LSW_LOCK ..." we this time also get the parameters of the executed statement:
Thus in this case the problem is related to the task activity tracking. You can try to adjust the cache settings for the task activity measurement via the 100Custom.xml properties as used in the non-container world, except in this case you need to modify your cr.yml file and add the required properties under the tag liberty_custom_xml and apply it:
liberty_custom_xml: <server> <task-activity-measures> <max-cache-size merge="replace"> 200 </max-cache-size> <max-cache-entry-update-count> 100 < /max-cache-entry-update-count> <cache-entry-removal-policy> lru </cache-entry-removal-policy> </task-activity-measures> </server>
Further details on the procedure can be found in the Knowledge Center. However in this specific case this does not help and gives us the opportunity to have a closer look at some other tools described in the next section.
4.3 IO bottlenecks and tools for measurement
Fast disks can be essential for a high workload system. How can we check the actual performance of our disk? For a Linux system you can use the command
iostat -x 1. This will print every second the disk utilisations to the terminal. A sample output can be seen here:
Based on the CPU load everything looks nice, 81.85 %idle and 0.82 %iowait. But when you check the disk %util the number can be rather high and can cause bottlenecks which explains for example long wait times for the insert operations.
Thus what can you do? In this case we used a single disk to host the PostgreSQL server system. The first attempt beside a quicker disk would be to distribute the load on multiple disks. There have been older publications which described the procedure, only some naming conventions have been changed. Thus the first focus is the pg_wal directory.
Thus the procedure is the following:
- Stop the PostgreSQL server
- Transfer the directory pg_wal, located in the directory with the databases, to another disk
- Create a symbolic link in the old place
- Start the PostgreSQL server
In the result you can see that the load is now distributed between the two disks (sda and sdb) currently being used:
We did not focus here on the log directory as in our case there was no significant number of logging being used, therefore there was no need to move the data to a different disk. The procedure would have been the same though.
Did this solve the problem we observed? Sure not, but it was a good example to present some very useful tools to debug performance problems. The solution to our high disk load will be discovered in the Performance Tuning section.
5. Performance Tuning
This section will cover soon for example:
- Are my configuration settings for WAL right?
- What tables might be good candidates to be pinned in memory?
Thus stay tuned further details will follow over the next weeks.
5.1 Performance monitoring
For the performance tuning it is also important to monitor the results from changes being applied, therefore we will have a look at some further tools to do this in this section.
5.2 Write Ahead Log (WAL) considerations
We talked in section 4.3 about large amounts of IO activities. Especially for high workloads the configuration of the Write Ahead Log (WAL) can have a significant impact on IO utilization and thereby also on the general system performance. Therefore it is important to check the configuration as well as the logging.
One important configuration parameter is the wal_compression setting, By default it is disabled, however to reduce IO utilization it makes sense to consider enabling the wal_compression.
5.3 Index advisors
When running SQL statements the existance of indexes can make the difference. However one needs to evaluate between the advantages for read operations vs. the extra costs for update/insert operations when adding a new index. Product shipped indexes should not be touched. PostgreSQL has the option to run an index advisor based on the database log files. We will have a look how the dexter index advisor can be enabled.
5.3.1 Preparation for index advisors
5.3.2 The dexter index advisor
5.3.3 The pg_qualstats_index_Advisor
6. Advanced configuration
In the earlier section
From the WebSphere Application Server world you might be already aware on connection pooling on the application server side. A similar mechanism can be also used for PostgreSQL databases. There exist a few methods which will differ in their feature options. Two options to mention are pgpool and PgBouncer. More details will be added at a later point in time as I have not performed tests with this kind of configuration.
Z. Last remarks
I hope the information shared can be of value to you. If you are missing some kind of information feel free to reach out to me and we might extend the blog entry.
Thanks for your interest and have fun with your Workflow product.
PostgreSQL Configuration - Best Practices for Performance and Security, Baji Shaik, Apress, 2020
PostgreSQL High Performance Cookbook - Mastering query optimization, database monitoring, and performance-tuning for PostgreSQL, Chitij Chauhan, Dinesh Kumar, Packt Publishing, 2017
PostgreSQL 9.6 Performance Story, Siyeon Kim, Siyeon Academy, 2017