0. Introduction
With the new product versions Business Automation Workflow on Containers (BAW on containers) 20.0.0.2 and IBM Cloud Pak for Automation (ICP4A) 20.0.3 the use of PostgreSQL databases is now also supported. In this blog we will do a short overview on PostgreSQL basics which might come in handy during the setup process with any of the workflow related products. Especially in the Public Cloud scenarios a PostgreSQL database can be an attractive option. One always needs to keep in mind that network latency between the database and the Workflow product can have a significant impact on throughput, in some cases it can also make the handling impossible. Therefore the database best should have network latency in the millisecond range when connecting to the Workflow product (we are testing with network latencies around 0.5 ms).
In this first part of the blog we will focus on a non-container PostgreSQL server installation. The setup with a container based PostgreSQL server in the same namespace as the Workflow product will be covered in a second part.
1. Installation
1.1 Howto install a PostgreSQL database
If you are new to PostgreSQL there exist rather simple steps to do a basic installation on the PostgreSQL homepage. The following examples will be based on an installation on Redhat Enterprise Linux Server 8 (RHEL8), but should in general also apply to other platforms. Very few commands need to be executed to get a PostgreSQL server up and running, which is automatically started after the complete machine was restarted. We will not go in the details for setting up the PostgreSQL server itself as there are a lot of different options and this is already covered by PostgreSQL itself, although we later might give some tips on further tuning options.
One thing you should think before the installation already is what hardware do I need? Of course this will depend on the workload of the future system. The PostgreSQL server itself is not resource hungry to get up and running, but this can change with the workload. In the IBM Knowledge Center we have some very basic resource requirements listed. For production loads you definitely want to increase these values. Thus for a production load database server an extreme fast storage for transaction logs should be attempted (earlier one stated SSD, I currently would state NVMe or similar speed). With our current performance tests we can see bottlenecks when simply relying on SSD storage and I can also see similar patterns with customers. The fio tool can be used to get some idea on the disk performance available.
Another component is memory. As the Workflow products have some central tables which can suffer under high load scenarios I normally tend to pin some of the tables in the memory for better performance. This requires however that sufficient memory is available, depending on the involved tables this can consume more or less memory. In a later section we will go into further details for which tables this procedure can make sense. And PostgreSQL is a nice platform to easily dig deeper on the resource consumption and also long running SQL statements.
1.2 PostgreSQL Server Extensions
Besides PostgreSQL there exist a number of extensions which could bring additional benefits. The most prominent extension might be the pg_stat_statements which we will cover later. For the beginning we should be set already. In the further sections we might see some more extensions though and we will apply them where needed.
2. Operations
2.1 Administration tools
There exist a number of tools to interact with the database system. We will have a short overview of a small selection, there for sure will be more tools, the selection is arbitrary based on observations made in the past and not complete. The most basic is the psql command line interface, which has a number of nice options we might partly see in the further progress.
To give a short overview on graphical user interfaces please have a look to the following tools table:
Name / Link |
Description |
pgAdmin3 |
Retiring Standalone client, recommendation to move to pgAdmin4 |
pgAdmin4 |
The successor of pgAdmin3, which now uses a webserver to provide a webinterface. Below will be a screenshot shared for reference |
phpPgAdmin |
PHP based client |
Adminer |
PHP based client |
|
|
The following shows a screenshot of pgAdmin4. One can see a number of different tabs which can be helpful to monitor the system. This is just a pointer what is available there. The concrete usage of all options can not be covered as part of this blog, where applicable functions will be used to show concrete examples.

2.2 Restarting/Reloading the PostgreSQL server
While setting up the system or doing performance tuning one wants to get the new parameters available immediately. One needs to differ between properties which are effective immediately after a reload and those which will only be effective after a restart. To determine what is the case here, there exist two methods which can extend each other:
- Method1: You can run this SQL statement
SELECT name, context FROM pg_settings WHERE name ilike '%log%';
where you replace the 'log' search with your parameter naming. If the result shows a sighup a reload is sufficient, for postmaster a restart is required. There are other results like superuser, therefore method 2 might bring some more clear insights.
- Method2: Change all required parameters and execute a reload. Afterwards you can run the following statement to see any parameter requiring a restart:
SELECT name FROM pg_settings WHERE pending_restart IS true;
One can state that PostgreSQL performs a restart rather quickly. I however would recommend to check your baw container logs after such an operation as sometimes things might hook up when the database server gets absent for some seconds.
2.3 Backup and Restore
Backup and restore is a basic activity for production usage and you most likely have a professional DBA to guard your business data. Thus the commands provided here are for your convenience in testing environments if you want to be able to recover quickly (there for sure exist more advanced features as well):
- Backup a single database: pg_dump <dbname> > dbname.bak
- Backup all databases: pg_dumpall > pg_backup.bak
- Restore a single database: createdb <dbname>
psql -h localhost –U pgsql -d <dbname> < dbname.bak
- Restore all databases: psql -h localhost –U pgsql < pg_backup.bak
3. Configuration
3.1 Central configuration files
There exist 3 main configuration files one should be aware of:
- postgresql.conf - Central file for database configuration settings. Default path on RHEL Linux: /var/lib/pgsql/13/data
- pg_hba.conf - Client authentication, e.g. required during setup to allow access to the system from the outside
- pg_ident.conf - Mapping of external authentication to database usernames
3.2 OS specific required settings
This is a reminder that some of the configuration settings of PostgreSQL will depend on the underlying operating system configuration. We will later list some configuration parameters of potential interest here. Be aware that there exist also advanced tuning tools, e.g. like the tuned you can find on RHEL Linux.
3.3 Parameters of postgresql.conf which can be of interest for production loads
This section will describe some configuration parameters which can have an impact on your system (additional parameters might be added over time). Be aware that required configuration settings will depend on workloads being executed, thus this collection can give you only some ideas what exists and what could be worth to be adjusted.
Connections (Default 100) |
This parameter might need to be adjusted depending on the attempted workload. Also pooling might be a topic (see later) |
wal_buffers (Default 1/32 of shared_buffers) |
Default seems to be capped at 16MB, recommendations for 32MB exist |
bgwriter_lru_maxpages (Default 100) |
This limits the number of pages being written by bgwriter each turn, to be large enough the value might need to be raised |
bgwriter_lru_multiplier (Default 2) |
Slight increase to values of 3 or 4 can also make sense for heavier loaded system |
effective_io_concurrency |
For magnetic disks this will scale with the number of spindles, for SSDs this should be in the region of hundreds |
checkpoint_completion_target (Default 0.5) |
To reduce the IO impact it can make sense to increase the value to 0.8 or 0.9 (closer to checkpoint timeout) |
autovacuum_max_workers (Default 3) |
For larger database systems it can be beneficial to increase this value in the region of around 10 depending on load |
autovacuum_naptime (Default 1 min) |
On busy systems or systems with a large number of databases this most likely needs to be increased |
autovacuum_vacuum_cost_delay (Default 20 ms) |
Rather conservative setting, depending on load should be lowered to allow VACUUM to stay in sync with the system |
max_prepared_transactions |
Recommendation increase at least to 200 |
What this actually means is, that for a well performing system measurements need to be made with a characteristic production workload to determine the right settings of the parameters. Especially the background logging can generate significant IO load when parameters do not match the required settings.
4. Debugging performance issues
Performance issues can have multiple causes. The following sections will show ways to debug specific aspects. While section 4.1 uses the pg_stat_statements extension, which is very useful to get a combined view of execution time and also the number of executions, the pg_stat_activity view of section 4.3 is a quick out-of-the-box solution to identify long running SQL statements, if the described configuration changes are not or can not be applied.
4.1 The pg_stat_statements extension
In this section some pre-configuration is required. PostgreSQL already ships some extensions which can be enabled. For performing performance related debugging the pg_stat_statements extension is highly recommended to be applied:
- In your postgresql.conf file find the shared_preload_libraries entry and add the pg_stat_statement module:
shared_preload_libraries = 'pg_stat_statements'
Hint: Make sure that there is only one postgresql.conf file in your configuration directory, else your changes might not be effective. An exception like the following can be seen then:
"ERROR: pg_stat_statements must be loaded via shared_preload_libraries".
- After the change to the postgresql.conf file a server restart is required
- The extension can be enabled on the database by
CREATE EXTENSION pg_stat_statements;
Now we can have a look at the actual debugging. The following two SQL statements can come in very handy to better understand potential performance problems on the database (slightly modified from the following presentation at PostgresConf 2019: https://www.youtube.com/watch?v=5M2FFbVeLSs. For friends of sarcastic humor highly recommended.).
And here is our first SQL statement:
SELECT substring(query, 1, 60) AS short_query, round(total_exec_time::numeric, 2) AS total_exec_time, calls, round(mean_exec_time::numeric, 2) AS mean,
round((100 * total_exec_time / sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_overall FROM pg_stat_statements
ORDER BY total_exec_time DESC LIMIT 20;
The output of the command will look like the following screenshot. Main focus should be laid on the most resource consuming parts. This can be a few very expensive queries as well as millions of quickly executing queries which will not show up on their own for their execution time. Thus we can see the full picture here for the database. The statement will list the most expensive SQL statements with resource utilization as well as number of executions (output can be adjusted by skipping the substring command to see the full SQL statement):

After you have seen your first output of the statement you might think, how can I reset the counters to only see the latest executed SQL statements? This can be done with the following command (for more granular resets check the PostgreSQL documentation):
select pg_stat_statements_reset()
And here is our second SQL statement (also can be found from the PostgresConf 2019 talk):
SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan AS avg FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC ;
The statement will list scan activities and can be a good indicator of potentially missing indexes. A sample output will look like this:

4.2 The log_min_duration_statement
Thus from the pg_stat_statements output we already have a concrete idea what statements are problematic for the system. But there is a second option which can be used standalone or in combination with pg_stat_statements. PostgreSQL allows to enable additional logging and we will enable this and than have a look how this can bring us more insights for the problematic statements:
- Search inside your postgresql.conf configuration file the log_min_duration_statement property and change the value to a millisecond number (we use here 300 ms as an aggressive example, depending on your trouble scenario you might start with larger values or use the sampling option log_min_duration_sample - not described here)
- Now the question is restart or reload after the change? A reload should be sufficient in this case as can be checked by the methods described in section 2.2.
And then we can recreate our load pattern and check under the data/log directory what information has been logged. Besides the statement of "INSERT into LSW_LOCK ..." we this time also get the parameters of the executed statement:

Thus in this case the problem is related to the task activity tracking. You can try to adjust the cache settings for the task activity measurement via the 100Custom.xml properties as used in the non-container world, except in this case you need to modify your cr.yml file and add the required properties under the tag liberty_custom_xml and apply it:
liberty_custom_xml: <server> <task-activity-measures> <max-cache-size merge="replace"> 200 </max-cache-size> <max-cache-entry-update-count> 100 < /max-cache-entry-update-count> <cache-entry-removal-policy> lru </cache-entry-removal-policy> </task-activity-measures> </server>
Further details on the procedure can be found in the Knowledge Center. However in this specific case this does not help and gives us the opportunity to have a closer look at some other tools described in the next section.
4.3 Long running SQL statements
If your system has not applied the two earlier mentioned configuration settings, there is another way to identify long running SQL statements out of the box, the pg_stat_activity view is the central part here. Time information can be adjusted, the example will list statements taking more than 2 minutes:
SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';
4.4 IO bottlenecks and tools for measurement
Fast disks can be essential for a high workload system. How can we check the actual performance of our disk? For a Linux system you can use the command iostat -x 1
. This will print every second the disk utilisations to the terminal. A sample output can be seen here:

Based on the CPU load everything looks nice, 81.85 %idle and 0.82 %iowait. But when you check the disk %util the number can be rather high and can cause bottlenecks which explains for example long wait times for the insert operations.
Thus what can you do? In this case we used a single disk to host the PostgreSQL server system. The first attempt beside a quicker disk would be to distribute the load on multiple disks. There have been older publications which described the procedure, only some naming conventions have been changed. Thus the first focus is the pg_wal directory.
Thus the procedure is the following:
- Stop the PostgreSQL server
- Transfer the directory pg_wal, located in the directory with the databases, to another disk
- Create a symbolic link in the old place
- Start the PostgreSQL server
In the result you can see that the load is now distributed between the two disks (sda and sdb) currently being used:

We did not focus here on the log directory as in our case there was no significant number of logging being used, therefore there was no need to move the data to a different disk. The procedure would have been the same though.
Did this solve the problem we observed? Sure not, but it was a good example to present some very useful tools to debug performance problems. The solution to our high disk load will be discovered in the Performance Tuning section.
4.5 Some more words on Locking
In some environments the usage of additional libraries might be not possible, thus there are also PostgreSQL specific means to debug lock contention. The following SQL statement can be used to show blocked statements:
SELECT pid, pg_blocking_pids(pid), wait_event, wait_event_type, query FROM pg_stat_activity WHERE backend_type = 'client backend' AND wait_event_type ~ 'Lock';
select
relname as relation_name,
query,
pg_locks.*
from pg_locks
join pg_class on pg_locks.relation = pg_class.oid
join pg_stat_activity on pg_locks.pid = pg_stat_activity.pid;
The output will give insights on SQL statements being involved in a lock contention.
4.6 Some diagnostic SQL statements
As mentioned in the last paragraph in some environments administrative access can be limited. Therefore I will list here some SQL queries, which can give some insights on the product data. However this can only provide a partly picture.
The following SQL statement will collect some data on the table statistics. Run against the BPM database. Besides the number of rows in the table there is also some data on scans as well as statistics collection/vacuum cleanup collected:
select schemaname, relname, seq_scan, idx_scan, n_live_tup, n_mod_since_analyze, n_ins_since_vacuum, last_vacuum, last_autovacuum, last_analyze from pg_stat_all_tables ;
This SQL statement will give an overview over snapshots and associated process instances. It is a good indicator for missing housekeeping activities, but also can show the number of snapshots being in active use.
select substr(snap.name,1,50) as "name",
proj.short_name as "container_Acronym",
snap.acronym as "container_SnapshotAcronym",
bpd.snapshot_id as "snapshot id",
substr(code.NAME,1,20) as "status",
count(bpd.EXECUTION_STATUS) as "number of instances"
from LSW_BPD_INSTANCE bpd
left outer join lsw_bpd_status_codes code on code.STATUS_ID = bpd.execution_status
left outer join lsw_snapshot snap on bpd.snapshot_id = snap.snapshot_id
left outer join lsw_project proj on snap.PROJECT_ID = proj.PROJECT_ID
group by snap.name, proj.short_name, snap.acronym, bpd.snapshot_id, code.NAME
order by snap.name, proj.short_name, snap.acronym, bpd.snapshot_id, code.NAME
;
4.7 Bloat and what you want to know about it
During one of my last performance tests I observed a pattern, which I think is of general interest, especially when coming from other databases. As you might know already PostgreSQL will add an additional row for an update, while other database systems will update the same row. Therefore there is the VACUUM cleanup job running in the background, which will handle these extra rows. When you now end up with transactions in a blocked state, this can prevent the VACUUM process to successfully do its job. This will lead to extra rows not being deleted and growing in size. When querying the table you will see nothing unusual, while you notice that execution times and CPU utilization can dramatically increase.
Thus how can you identify your bloat? There is a SQL statement available from https://wiki.postgresql.org/wiki/Show_database_bloat :
SELECT
current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/
ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::float/otta END)::numeric,1) AS tbloat,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes,
iname, /*ituples::bigint, ipages::bigint, iotta,*/
ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::float/iotta END)::numeric,1) AS ibloat,
CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta,
COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages,
COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
LEFT JOIN pg_index i ON indrelid = cc.oid
LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid
) AS sml
ORDER BY wastedbytes DESC;
This will generate a nice overview for all tables and indexes. As you can see in the screenshot below the bloat can be quite significant.
The central question now is, what can you do to remove the bloat? Normally you can run VACUUM or VACUUM FULL on affected tables, but as in this case this might not work especially when the automatic job failed earlier. There exist four cases which can block this successful execution. For the moment I will just link an external source what can be done (as always be cautious as content can change over time): https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/ and https://medium.com/compass-true-north/dealing-with-significant-postgres-database-bloat-what-are-your-options-a6c1814a03a5
4.8 How to identify prepared transactions blocking the system
ICP4BA utilizes two-phase commit transactions and PREPARE TRANSACTION on PostgreSQL. Due to the blocking nature of PREPARE TRANSACTION the PostgreSQL documentation points out that a timely processing with COMMIT or ROLLBACK is required to no block resources. A more detailed background description can be found here: https://www.dbi-services.com/blog/be-careful-with-prepared-transactions-in-postgresql/. The most important statement for us in this context is the following:
select * from pg_prepared_xacts;
5. Performance Tuning
This section will cover soon for example:
- Are my configuration settings for WAL right?
- What tables might be good candidates to be pinned in memory?
- ...
Thus stay tuned further details will follow over the next weeks.
5.1 Performance monitoring
For the performance tuning it is also important to monitor the results from changes being applied, therefore we will have a look at some further tools to do this in this section.
5.2 Write Ahead Log (WAL) considerations
We talked in section 4.3 about large amounts of IO activities. Especially for high workloads the configuration of the Write Ahead Log (WAL) can have a significant impact on IO utilization and thereby also on the general system performance. Therefore it is important to check the configuration as well as the logging.
One important configuration parameter is the wal_compression setting, By default it is disabled, however to reduce IO utilization it makes sense to consider enabling the wal_compression.
5.3 Index advisors
When running SQL statements the existance of indexes can make the difference. However one needs to evaluate between the advantages for read operations vs. the extra costs for update/insert operations when adding a new index. Product shipped indexes should not be touched. PostgreSQL has the option to run an index advisor based on the database log files. We will have a look how the dexter index advisor can be enabled.
5.3.1 Preparation for index advisors
5.3.2 The dexter index advisor
5.3.3 The pg_qualstats_index_Advisor
6. Advanced configuration
In the earlier section
From the WebSphere Application Server world you might be already aware on connection pooling on the application server side. A similar mechanism can be also used for PostgreSQL databases. There exist a few methods which will differ in their feature options. Two options to mention are pgpool and PgBouncer. More details will be added at a later point in time as I have not performed tests with this kind of configuration.
7. Resource utilization
7.1 Is it possible to limit CPU and memory on a database or user level?
Lastly the question came up, if one can limit CPU and memory resources in context of multi-tenancy on the same database server instance. When talking about a container environment like OpenShift limits can be set for the instance pod as it can be done for all other pods as well, but there is currently no feature to limit for multiple databases or users running on the same instance central resources like CPU or memory. What one can do is to limit the number of connections for each individual database.
When talking about limitations there exists the github project https://github.com/cybertec-postgresql/pg_cgroups. But this is more of interest to limit the PostgreSQL instance via cgroups on a non-container environment, something which is already in place for OpenShift or Kubernetes.
Z. Last remarks
I hope the information shared can be of value to you. If you are missing some kind of information feel free to reach out to me and we might extend the blog entry.
Thanks for your interest and have fun with your Workflow product.
R. References
PostgreSQL Configuration - Best Practices for Performance and Security, Baji Shaik, Apress, 2020
PostgreSQL High Performance Cookbook - Mastering query optimization, database monitoring, and performance-tuning for PostgreSQL, Chitij Chauhan, Dinesh Kumar, Packt Publishing, 2017
PostgreSQL 9.6 Performance Story, Siyeon Kim, Siyeon Academy, 2017