BPM, Workflow, and Case

Come for answers. Stay for best practices. All we’re missing is you.

View Only

Back to Blog List

Using a PostgreSQL database in the context of business automation workflows and performance (Part1: Introduction)

By Stephan Volz posted Wed December 30, 2020 02:51 PM

0. Introduction

With the new product versions Business Automation Workflow on Containers (BAW on containers) 20.0.0.2 and IBM Cloud Pak for Automation (ICP4A) 20.0.3 the use of PostgreSQL databases is now also supported. In this blog we will do a short overview on PostgreSQL basics which might come in handy during the setup process with any of the workflow related products. Especially in the Public Cloud scenarios a PostgreSQL database can be an attractive option. One always needs to keep in mind that network latency between the database and the Workflow product can have a significant impact on throughput, in some cases it can also make the handling impossible. Therefore the database best should have network latency in the millisecond range when connecting to the Workflow product (we are testing with network latencies around 0.5 ms).

In this first part of the blog we will focus on a non-container PostgreSQL server installation. The setup with a container based PostgreSQL server in the same namespace as the Workflow product will be covered in a second part.

1. Installation

1.1 Howto install a PostgreSQL database

If you are new to PostgreSQL there exist rather simple steps to do a basic installation on the PostgreSQL homepage. The following examples will be based on an installation on Redhat Enterprise Linux Server 8 (RHEL8), but should in general also apply to other platforms. Very few commands need to be executed to get a PostgreSQL server up and running, which is automatically started after the complete machine was restarted. We will not go in the details for setting up the PostgreSQL server itself as there are a lot of different options and this is already covered by PostgreSQL itself, although we later might give some tips on further tuning options.

One thing you should think before the installation already is what hardware do I need? Of course this will depend on the workload of the future system. The PostgreSQL server itself is not resource hungry to get up and running, but this can change with the workload. In the IBM Knowledge Center we have some very basic resource requirements listed. For production loads you definitely want to increase these values. Thus for a production load database server an extreme fast storage for transaction logs should be attempted (earlier one stated SSD, I currently would state NVMe or similar speed). With our current performance tests we can see bottlenecks when simply relying on SSD storage and I can also see similar patterns with customers. The fio tool can be used to get some idea on the disk performance available.

Another component is memory. As the Workflow products have some central tables which can suffer under high load scenarios I normally tend to pin some of the tables in the memory for better performance. This requires however that sufficient memory is available, depending on the involved tables this can consume more or less memory. In a later section we will go into further details for which tables this procedure can make sense. And PostgreSQL is a nice platform to easily dig deeper on the resource consumption and also long running SQL statements.

1.2 PostgreSQL Server Extensions

Besides PostgreSQL there exist a number of extensions which could bring additional benefits. The most prominent extension might be the pg_stat_statements which we will cover later. For the beginning we should be set already. In the further sections we might see some more extensions though and we will apply them where needed.

2. Operations

2.1 Administration tools

There exist a number of tools to interact with the database system. We will have a short overview of a small selection, there for sure will be more tools, the selection is arbitrary based on observations made in the past and not complete. The most basic is the psql command line interface, which has a number of nice options we might partly see in the further progress.

To give a short overview on graphical user interfaces please have a look to the following tools table:

Name / Link	Description
pgAdmin3	Retiring Standalone client, recommendation to move to pgAdmin4
pgAdmin4	The successor of pgAdmin3, which now uses a webserver to provide a webinterface. Below will be a screenshot shared for reference
phpPgAdmin	PHP based client
Adminer	PHP based client

The following shows a screenshot of pgAdmin4. One can see a number of different tabs which can be helpful to monitor the system. This is just a pointer what is available there. The concrete usage of all options can not be covered as part of this blog, where applicable functions will be used to show concrete examples.

pgAdmin4 Userinterface

2.2 Restarting/Reloading the PostgreSQL server

While setting up the system or doing performance tuning one wants to get the new parameters available immediately. One needs to differ between properties which are effective immediately after a reload and those which will only be effective after a restart. To determine what is the case here, there exist two methods which can extend each other:

Method1: You can run this SQL statement SELECT name, context FROM pg_settings WHERE name ilike '%log%'; where you replace the 'log' search with your parameter naming. If the result shows a sighup a reload is sufficient, for postmaster a restart is required. There are other results like superuser, therefore method 2 might bring some more clear insights.
Method2: Change all required parameters and execute a reload. Afterwards you can run the following statement to see any parameter requiring a restart: SELECT name FROM pg_settings WHERE pending_restart IS true;

One can state that PostgreSQL performs a restart rather quickly. I however would recommend to check your baw container logs after such an operation as sometimes things might hook up when the database server gets absent for some seconds.

2.3 Backup and Restore

Backup and restore is a basic activity for production usage and you most likely have a professional DBA to guard your business data. Thus the commands provided here are for your convenience in testing environments if you want to be able to recover quickly (there for sure exist more advanced features as well):

Backup a single database: pg_dump <dbname> > dbname.bak
Backup all databases: pg_dumpall > pg_backup.bak
Restore a single database: createdb <dbname>
psql -h localhost –U pgsql -d <dbname> < dbname.bak
Restore all databases: psql -h localhost –U pgsql < pg_backup.bak

3. Configuration

3.1 Central configuration files

There exist 3 main configuration files one should be aware of:

postgresql.conf - Central file for database configuration settings. Default path on RHEL Linux: /var/lib/pgsql/13/data
pg_hba.conf - Client authentication, e.g. required during setup to allow access to the system from the outside
pg_ident.conf - Mapping of external authentication to database usernames

3.2 OS specific required settings

This is a reminder that some of the configuration settings of PostgreSQL will depend on the underlying operating system configuration. We will later list some configuration parameters of potential interest here. Be aware that there exist also advanced tuning tools, e.g. like the tuned you can find on RHEL Linux.

3.3 Parameters of postgresql.conf which can be of interest for production loads

This section will describe some configuration parameters which can have an impact on your system (additional parameters might be added over time). Be aware that required configuration settings will depend on workloads being executed, thus this collection can give you only some ideas what exists and what could be worth to be adjusted.

Connections (Default 100)	This parameter might need to be adjusted depending on the attempted workload. Also pooling might be a topic (see later)
wal_buffers (Default 1/32 of shared_buffers)	Default seems to be capped at 16MB, recommendations for 32MB exist
bgwriter_lru_maxpages (Default 100)	This limits the number of pages being written by bgwriter each turn, to be large enough the value might need to be raised
bgwriter_lru_multiplier (Default 2)	Slight increase to values of 3 or 4 can also make sense for heavier loaded system
effective_io_concurrency	For magnetic disks this will scale with the number of spindles, for SSDs this should be in the region of hundreds
checkpoint_completion_target (Default 0.5)	To reduce the IO impact it can make sense to increase the value to 0.8 or 0.9 (closer to checkpoint timeout)
autovacuum_max_workers (Default 3)	For larger database systems it can be beneficial to increase this value in the region of around 10 depending on load
autovacuum_naptime (Default 1 min)	On busy systems or systems with a large number of databases this most likely needs to be increased
autovacuum_vacuum_cost_delay (Default 20 ms)	Rather conservative setting, depending on load should be lowered to allow VACUUM to stay in sync with the system
max_prepared_transactions	Recommendation increase at least to 200

What this actually means is, that for a well performing system measurements need to be made with a characteristic production workload to determine the right settings of the parameters. Especially the background logging can generate significant IO load when parameters do not match the required settings.

4. Debugging performance issues

Performance issues can have multiple causes. The following sections will show ways to debug specific aspects. While section 4.1 uses the pg_stat_statements extension, which is very useful to get a combined view of execution time and also the number of executions, the pg_stat_activity view of section 4.3 is a quick out-of-the-box solution to identify long running SQL statements, if the described configuration changes are not or can not be applied.

4.1 The pg_stat_statements extension

In this section some pre-configuration is required. PostgreSQL already ships some extensions which can be enabled. For performing performance related debugging the pg_stat_statements extension is highly recommended to be applied:

In your postgresql.conf file find the shared_preload_libraries entry and add the pg_stat_statement module:
shared_preload_libraries = 'pg_stat_statements'
Hint: Make sure that there is only one postgresql.conf file in your configuration directory, else your changes might not be effective. An exception like the following can be seen then:
"ERROR: pg_stat_statements must be loaded via shared_preload_libraries".
After the change to the postgresql.conf file a server restart is required
The extension can be enabled on the database by
CREATE EXTENSION pg_stat_statements;

Now we can have a look at the actual debugging. The following two SQL statements can come in very handy to better understand potential performance problems on the database (slightly modified from the following presentation at PostgresConf 2019: https://www.youtube.com/watch?v=5M2FFbVeLSs. For friends of sarcastic humor highly recommended.).

And here is our first SQL statement:

SELECT substring(query, 1, 60) AS short_query, round(total_exec_time::numeric, 2) AS total_exec_time, calls, round(mean_exec_time::numeric, 2) AS mean, round((100 * total_exec_time / sum(total_exec_time::numeric) OVER ())::numeric, 2) AS percentage_overall FROM pg_stat_statements ORDER BY total_exec_time DESC LIMIT 20;

The output of the command will look like the following screenshot. Main focus should be laid on the most resource consuming parts. This can be a few very expensive queries as well as millions of quickly executing queries which will not show up on their own for their execution time. Thus we can see the full picture here for the database. The statement will list the most expensive SQL statements with resource utilization as well as number of executions (output can be adjusted by skipping the substring command to see the full SQL statement):

Top 20 SQL statements being captured by the pg_stat_statements view

After you have seen your first output of the statement you might think, how can I reset the counters to only see the latest executed SQL statements? This can be done with the following command (for more granular resets check the PostgreSQL documentation):

select pg_stat_statements_reset()

And here is our second SQL statement (also can be found from the PostgresConf 2019 talk):

SELECT schemaname, relname, seq_scan, seq_tup_read, idx_scan, seq_tup_read / seq_scan AS avg FROM pg_stat_user_tables WHERE seq_scan > 0 ORDER BY seq_tup_read DESC ;

The statement will list scan activities and can be a good indicator of potentially missing indexes. A sample output will look like this:

Scan statistics for most frequently used tables

4.2 The log_min_duration_statement

Thus from the pg_stat_statements output we already have a concrete idea what statements are problematic for the system. But there is a second option which can be used standalone or in combination with pg_stat_statements. PostgreSQL allows to enable additional logging and we will enable this and than have a look how this can bring us more insights for the problematic statements:

Search inside your postgresql.conf configuration file the log_min_duration_statement property and change the value to a millisecond number (we use here 300 ms as an aggressive example, depending on your trouble scenario you might start with larger values or use the sampling option log_min_duration_sample - not described here)
Now the question is restart or reload after the change? A reload should be sufficient in this case as can be checked by the methods described in section 2.2.

And then we can recreate our load pattern and check under the data/log directory what information has been logged. Besides the statement of "INSERT into LSW_LOCK ..." we this time also get the parameters of the executed statement:

Log output when enabling log_min_duration_statement

Thus in this case the problem is related to the task activity tracking. You can try to adjust the cache settings for the task activity measurement via the 100Custom.xml properties as used in the non-container world, except in this case you need to modify your cr.yml file and add the required properties under the tag liberty_custom_xml and apply it:

liberty_custom_xml: <server> <task-activity-measures> <max-cache-size merge="replace"> 200 </max-cache-size> <max-cache-entry-update-count> 100 < /max-cache-entry-update-count> <cache-entry-removal-policy> lru </cache-entry-removal-policy> </task-activity-measures> </server>

Further details on the procedure can be found in the Knowledge Center. However in this specific case this does not help and gives us the opportunity to have a closer look at some other tools described in the next section.

4.3 Long running SQL statements

If your system has not applied the two earlier mentioned configuration settings, there is another way to identify long running SQL statements out of the box, the pg_stat_activity view is the central part here. Time information can be adjusted, the example will list statements taking more than 2 minutes:

SELECT pid, now() - pg_stat_activity.query_start AS duration, query, state FROM pg_stat_activity WHERE (now() - pg_stat_activity.query_start) > interval '2 minutes';

4.4 IO bottlenecks and tools for measurement

Fast disks can be essential for a high workload system. How can we check the actual performance of our disk? For a Linux system you can use the command iostat -x 1. This will print every second the disk utilisations to the terminal. A sample output can be seen here:

Sample iostat output

Based on the CPU load everything looks nice, 81.85 %idle and 0.82 %iowait. But when you check the disk %util the number can be rather high and can cause bottlenecks which explains for example long wait times for the insert operations.

Thus what can you do? In this case we used a single disk to host the PostgreSQL server system. The first attempt beside a quicker disk would be to distribute the load on multiple disks. There have been older publications which described the procedure, only some naming conventions have been changed. Thus the first focus is the pg_wal directory.

Thus the procedure is the following:

Stop the PostgreSQL server
Transfer the directory pg_wal, located in the directory with the databases, to another disk
Create a symbolic link in the old place
Start the PostgreSQL server

In the result you can see that the load is now distributed between the two disks (sda and sdb) currently being used:

iostat information showing disk utilisation with now two disks being utilized

We did not focus here on the log directory as in our case there was no significant number of logging being used, therefore there was no need to move the data to a different disk. The procedure would have been the same though.

Did this solve the problem we observed? Sure not, but it was a good example to present some very useful tools to debug performance problems. The solution to our high disk load will be discovered in the Performance Tuning section.

4.5 Some more words on Locking

In some environments the usage of additional libraries might be not possible, thus there are also PostgreSQL specific means to debug lock contention. The following SQL statement can be used to show blocked statements:

SELECT pid, pg_blocking_pids(pid), wait_event, wait_event_type, query FROM pg_stat_activity WHERE backend_type = 'client backend' AND wait_event_type ~ 'Lock';

select relname as relation_name, query, pg_locks.* from pg_locks join pg_class on pg_locks.relation = pg_class.oid join pg_stat_activity on pg_locks.pid = pg_stat_activity.pid;

The output will give insights on SQL statements being involved in a lock contention.

4.6 Some diagnostic SQL statements

As mentioned in the last paragraph in some environments administrative access can be limited. Therefore I will list here some SQL queries, which can give some insights on the product data. However this can only provide a partly picture.

The following SQL statement will collect some data on the table statistics. Run against the BPM database. Besides the number of rows in the table there is also some data on scans as well as statistics collection/vacuum cleanup collected: select schemaname, relname, seq_scan, idx_scan, n_live_tup, n_mod_since_analyze, n_ins_since_vacuum, last_vacuum, last_autovacuum, last_analyze from pg_stat_all_tables ;

This SQL statement will give an overview over snapshots and associated process instances. It is a good indicator for missing housekeeping activities, but also can show the number of snapshots being in active use. select substr(snap.name,1,50) as "name", proj.short_name as "container_Acronym", snap.acronym as "container_SnapshotAcronym", bpd.snapshot_id as "snapshot id", substr(code.NAME,1,20) as "status", count(bpd.EXECUTION_STATUS) as "number of instances" from LSW_BPD_INSTANCE bpd left outer join lsw_bpd_status_codes code on code.STATUS_ID = bpd.execution_status left outer join lsw_snapshot snap on bpd.snapshot_id = snap.snapshot_id left outer join lsw_project proj on snap.PROJECT_ID = proj.PROJECT_ID group by snap.name, proj.short_name, snap.acronym, bpd.snapshot_id, code.NAME order by snap.name, proj.short_name, snap.acronym, bpd.snapshot_id, code.NAME ;

4.7 Bloat and what you want to know about it

During one of my last performance tests I observed a pattern, which I think is of general interest, especially when coming from other databases. As you might know already PostgreSQL will add an additional row for an update, while other database systems will update the same row. Therefore there is the VACUUM cleanup job running in the background, which will handle these extra rows. When you now end up with transactions in a blocked state, this can prevent the VACUUM process to successfully do its job. This will lead to extra rows not being deleted and growing in size. When querying the table you will see nothing unusual, while you notice that execution times and CPU utilization can dramatically increase.

Thus how can you identify your bloat? There is a SQL statement available from https://wiki.postgresql.org/wiki/Show_database_bloat :

SELECT current_database(), schemaname, tablename, /*reltuples::bigint, relpages::bigint, otta,*/ ROUND((CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages::float/otta END)::numeric,1) AS tbloat, CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::BIGINT END AS wastedbytes, iname, /*ituples::bigint, ipages::bigint, iotta,*/ ROUND((CASE WHEN iotta=0 OR ipages=0 THEN 0.0 ELSE ipages::float/iotta END)::numeric,1) AS ibloat, CASE WHEN ipages < iotta THEN 0 ELSE bs*(ipages-iotta) END AS wastedibytes FROM ( SELECT schemaname, tablename, cc.reltuples, cc.relpages, bs, CEIL((cc.reltuples*((datahdr+ma- (CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta, COALESCE(c2.relname,'?') AS iname, COALESCE(c2.reltuples,0) AS ituples, COALESCE(c2.relpages,0) AS ipages, COALESCE(CEIL((c2.reltuples*(datahdr-12))/(bs-20::float)),0) AS iotta -- very rough approximation, assumes all cols FROM ( SELECT ma,bs,schemaname,tablename, (datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr, (maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2 FROM ( SELECT schemaname, tablename, hdr, ma, bs, SUM((1-null_frac)*avg_width) AS datawidth, MAX(null_frac) AS maxfracsum, hdr+( SELECT 1+count(*)/8 FROM pg_stats s2 WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename ) AS nullhdr FROM pg_stats s, ( SELECT (SELECT current_setting('block_size')::numeric) AS bs, CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr, CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma FROM (SELECT version() AS v) AS foo ) AS constants GROUP BY 1,2,3,4,5 ) AS foo ) AS rs JOIN pg_class cc ON cc.relname = rs.tablename JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema' LEFT JOIN pg_index i ON indrelid = cc.oid LEFT JOIN pg_class c2 ON c2.oid = i.indexrelid ) AS sml ORDER BY wastedbytes DESC;

This will generate a nice overview for all tables and indexes. As you can see in the screenshot below the bloat can be quite significant.

The central question now is, what can you do to remove the bloat? Normally you can run VACUUM or VACUUM FULL on affected tables, but as in this case this might not work especially when the automatic job failed earlier. There exist four cases which can block this successful execution. For the moment I will just link an external source what can be done (as always be cautious as content can change over time): https://www.cybertec-postgresql.com/en/reasons-why-vacuum-wont-remove-dead-rows/ and https://medium.com/compass-true-north/dealing-with-significant-postgres-database-bloat-what-are-your-options-a6c1814a03a5

4.8 How to identify prepared transactions blocking the system

ICP4BA utilizes two-phase commit transactions and PREPARE TRANSACTION on PostgreSQL. Due to the blocking nature of PREPARE TRANSACTION the PostgreSQL documentation points out that a timely processing with COMMIT or ROLLBACK is required to no block resources. A more detailed background description can be found here: https://www.dbi-services.com/blog/be-careful-with-prepared-transactions-in-postgresql/. The most important statement for us in this context is the following: select * from pg_prepared_xacts;

5. Performance Tuning

This section will cover soon for example:

Are my configuration settings for WAL right?
What tables might be good candidates to be pinned in memory?
...

Thus stay tuned further details will follow over the next weeks.

5.1 Performance monitoring

For the performance tuning it is also important to monitor the results from changes being applied, therefore we will have a look at some further tools to do this in this section.

5.2 Write Ahead Log (WAL) considerations

We talked in section 4.3 about large amounts of IO activities. Especially for high workloads the configuration of the Write Ahead Log (WAL) can have a significant impact on IO utilization and thereby also on the general system performance. Therefore it is important to check the configuration as well as the logging.

One important configuration parameter is the wal_compression setting, By default it is disabled, however to reduce IO utilization it makes sense to consider enabling the wal_compression.

5.3 Index advisors

When running SQL statements the existance of indexes can make the difference. However one needs to evaluate between the advantages for read operations vs. the extra costs for update/insert operations when adding a new index. Product shipped indexes should not be touched. PostgreSQL has the option to run an index advisor based on the database log files. We will have a look how the dexter index advisor can be enabled.

5.3.1 Preparation for index advisors

5.3.2 The dexter index advisor

5.3.3 The pg_qualstats_index_Advisor

6. Advanced configuration

In the earlier section

From the WebSphere Application Server world you might be already aware on connection pooling on the application server side. A similar mechanism can be also used for PostgreSQL databases. There exist a few methods which will differ in their feature options. Two options to mention are pgpool and PgBouncer. More details will be added at a later point in time as I have not performed tests with this kind of configuration.

7. Resource utilization

7.1 Is it possible to limit CPU and memory on a database or user level?

Lastly the question came up, if one can limit CPU and memory resources in context of multi-tenancy on the same database server instance. When talking about a container environment like OpenShift limits can be set for the instance pod as it can be done for all other pods as well, but there is currently no feature to limit for multiple databases or users running on the same instance central resources like CPU or memory. What one can do is to limit the number of connections for each individual database.

When talking about limitations there exists the github project https://github.com/cybertec-postgresql/pg_cgroups. But this is more of interest to limit the PostgreSQL instance via cgroups on a non-container environment, something which is already in place for OpenShift or Kubernetes.

Z. Last remarks

I hope the information shared can be of value to you. If you are missing some kind of information feel free to reach out to me and we might extend the blog entry.
Thanks for your interest and have fun with your Workflow product.

R. References

PostgreSQL Configuration - Best Practices for Performance and Security, Baji Shaik, Apress, 2020

PostgreSQL High Performance Cookbook - Mastering query optimization, database monitoring, and performance-tuning for PostgreSQL, Chitij Chauhan, Dinesh Kumar, Packt Publishing, 2017

PostgreSQL 9.6 Performance Story, Siyeon Kim, Siyeon Academy, 2017

0 comments

149 views

Permalink

https://community.ibm.com/community/user/blogs/stephan-volz/2020/12/21/postgresql-with-workflows

BPM, Workflow, and Case

BPM, Workflow, and Case

Using a PostgreSQL database in the context of business automation workflows and performance (Part1: Introduction)

By Stephan Volz posted Wed December 30, 2020 02:51 PM

0. Introduction

1. Installation

1.1 Howto install a PostgreSQL database

1.2 PostgreSQL Server Extensions

2. Operations

2.1 Administration tools

2.2 Restarting/Reloading the PostgreSQL server

2.3 Backup and Restore

3. Configuration

3.1 Central configuration files

3.2 OS specific required settings

3.3 Parameters of postgresql.conf which can be of interest for production loads

4. Debugging performance issues

4.1 The pg_stat_statements extension

4.2 The log_min_duration_statement

4.3 Long running SQL statements

4.4 IO bottlenecks and tools for measurement

4.5 Some more words on Locking

4.6 Some diagnostic SQL statements

4.7 Bloat and what you want to know about it

4.8 How to identify prepared transactions blocking the system

5. Performance Tuning

5.1 Performance monitoring

5.2 Write Ahead Log (WAL) considerations

5.3 Index advisors

5.3.1 Preparation for index advisors

5.3.2 The dexter index advisor

5.3.3 The pg_qualstats_index_Advisor

6. Advanced configuration

7. Resource utilization

7.1 Is it possible to limit CPU and memory on a database or user level?

Z. Last remarks

R. References

Permalink

Additional Resources

Office

Quick Links

Additional
Resources