BPM, Workflow, and Case

BPM, Workflow, and Case

Come for answers. Stay for best practices. All we’re missing is you.

 View Only

Business Automation Workflow: Saved searches optimization and considerations for management

By Stephan Volz posted Thu June 22, 2023 04:19 PM

  

1. Introduction - Why we need saved searches and acceleration tools?

The following will describe performance tuning options in the context of IBM Business Automation Workflow and IBM Cloud Pak for Business Automation products.
When you develop a business workflow application in a number of cases you want to search some of the data entered or you want to give your end-users the option to search this data. Characteristics which can be queried are the so called searchable variables, which can be defined in the application design. As the case of corresponding entries is not always clear a case-insensitive search is required in a number of cases. 

1.1 Database - Tables and indexes

Our data is stored in a database, thus how does the database solve this issue? I want to mention upfront that the description in this section is slightly simplified to explain the background in a more easy to understand way, we will check some of the technical details at a later stage of this article. The database system will scan the table in question by a full table scan to read the data requested by the application or end-users, as we implement the case-insensitivity by introducing an UPPER() function, to convert any case to upper case, which than can be compared. To do so every entry needs to be converted, thus a full table scan is required and can not be avoided. However the full table scan is not cost effective if you need to read all data from the database table in this use case and in a number of cases also directly from disk - which is even worse for performance. 

This can be improved by special additional database indexes. For Oracle these are named function-based indexes, for Db2 they are named expression-based indexes and other database systems can have their own naming conventions but partly also use these terms. These indexes can be created in addition to the table data and there the already converted column entries are stored based on their index definition, in our described case we would apply the UPPER() function on the data. When a query is executed you only need to do an index lookup for the corresponding value and find the result. The drawback on these indexes is, that they need to be maintained. Thus for every insert or update operation the index also needs to be updated. Thus this is an evaluation how much can be gained by the quicker execution of read requests vs. the costs when doing insert/update operations. This explains why it is not useful to blindly generate indexes for all columns and corresponding indexes are not shipped by the product out of the box. 
If we would not do the comparison on the database side, we would need to request all data and compare the results on the application server side, which would be even more costly. Therefore this is not done.

1.2 Acceleration tools and additional optimization

As search queries can get complex especially also for security constraints of data access, the product allows with the acceleration tools to simplify the data search execution by creating two additional database tables which are joined in the database query for saved searches and help to reduce the data volume to be processed in the execution. The details how this can be done is already described in the product documentation (see e.g.here https://www.ibm.com/docs/en/baw/22.x?topic=portal-tuning-process-searches), therefore we will not mention the details here further. 

1.3 What to do when saved search execution performance is not well?

There are three ways to approach the typical performance related execution problems and we will cover all in the further sections.

a) Process Federation Server (PFS)
b) Acceleration tools (Pivot tables) as mentioned in the last section. Make sure that required indexes are in place as well
c) Pin the database tables involved in saved searches into memory. 

Be aware that application design can also affect saved search performance. I will not cover this in this article today.

2. Technical background and what can be done to improve performance

2.1 Process Federation Server (PFS)

The Process Federation Server was originally introduced to replace existing systems with a newly setup system and run both systems in parallel. It uses an Elasticsearch based index. It has advantages and disadvantages which need to be evaluated for a decision on this technology. One advantage is quick response times for data already being indexed. The disadvantage are two-fold, BAW uses the database to guarantee transactionality, which is not available for Elasticsearch. This leads to an overhead for the synchronization (two data sources) and can get large for large workloads. In addition there is a delay for the synchronization period, which can be annoying for end-users if they work on the processes/tasks they just created themselves, it can take a few seconds to see these in the client interface. In Cloud Pak for Business Automation PFS is configured out of the box as the default and does not require an additional install of PFS like on-prem. The synchronization frequency between database and the elastic search index used by PFS has a default setting of 1 second for the Cloud Pak for Business Automation, additional delays due to end-user client synchronization apply. We will not go into more details here and refer to the PFS documentation. 

2.2 Accelerator tools (Pivot tables) and new deployments

I assume here that the accelerator tables are already in place or have been considered to be applied. As soon as a new application version with new searchable variables or modified searchable variables is deployed, the accelerator tables need to be regenerated. This consists out of multiple steps (which are in detail described e.g. here:  https://www.ibm.com/docs/en/baw/22.x?topic=portal-tuning-process-searches ). The short version consists out of these three steps:
a) Remove existing accelerator tables
b) Recreate the accelerator tables and fill these with existing data
c) Recreate existing and potentially new indexes - keep in mind that especially case-insensitive searches can get very costly without proper indexes.

2.2.1 Identifying existing indexes

To identify already existing indexes, the following SQL can be used: Db2: select tabschema, tabname, indname, colnames, stats_time, indcard, lastused from syscat.indexes; Oracle: select substr(ui.index_name,1,35) as index_name, substr(ui.table_name,1,35) as table_name, substr(ui.tablespace_name,1,30) as tablespace_name, substr(uic.column_name,1,35) as column_name, uic.column_position, ui.status, ui.clustering_factor, ui.last_analyzed from USER_INDEXES ui, USER_IND_COLUMNS uic where ui.index_name = uic.index_name order by ui.index_name, uic.column_position;

Warning: Be aware that there exist limitations for the maximum number of columns a database table can have, this is also a limit for the maximum number of searchable variables when using the accelerator tables, e.g. on Oracle the limit is 1000 columns. 

2.2.2 Database indexes and the right creation

As the accelerator/pivot tables need to be regenerated (existing indexes are dropped) with any searchable variable change, this can get nasty for the index creation. Therefore I would recommend to use a script based approach where all required indexes are included. For changes it can be already predicted from the expected load and the queries being generated during testing what additional indexes could be required. This will prevent that indexes are forgotten. It is also advisable for a new application deployment to keep an eye on the database monitoring to immediately see potential new queries. 

Database indexes can have a fundamental impact on performance. If you run a saved-search with and without a function/expression based index the execution time for without an index could be 2 minutes for example while it is below 1 second when the right index is in place. This will not only affect your direct query, but also the general database system performance and could lead to unwanted contention when large amount of users execute sub-optimal queries. 

Therefore it is advisable after a new deployment to keep a close eye on the database monitoring data for long running queries. You can do this on Oracle via the AWR reports or on Db2 via the in-memory reports.

2.3 Keep data in memory

Another alternative is to keep saved-search data in memory. This is a not so well liked practice in some circles as you might find in database discussions on the internet, but this is already overhauled by database vendor offerings doing exactly this. In addition if your end-users need to wait 2 minutes and you might be able to do the same query in below a second, good practices for usage and not database theory are of interest here. Over time the available memory sizes have increased, what earlier was impossible now is possible. Of course you can not just allocate large amounts of memory from the database system and hope that everything will run fine. Therefore I have listed some example statements below for some database systems, which will list the used bytes (including BLOBs) of tables. The different database systems have different approaches to do this in real life. When you know the space consumption of the involved tables, you should  add this additional memory to the operations memory, thus you do not have memory shortage in other areas. A typical production system could require around 20 Gigabyte additional memory for this task depending on the data volume of the system and what is intended to be achieved. Thus see this as an example, you might need more or less.

Sidenote: Be aware that the pinning of tables in memory is not limited to saved searches. I have used this technique also in other contention scenarios like event manager for example. If you do not see contention it most likely is not required to use pinning as additional resources are required and not much more value will be generated. 
 

2.3.1 Db2

The approach for Db2, which is used here, will create an own buffer pool for the tables in question. You than will modify the table definition to use this buffer pool. To have a smooth experience it is advisable to already preload the data in memory by doing a full table scan. When in buffer it should not be removed again as there is no competing operation. 

a) Create an own bufferpool and move a  table to it.

First you need to create a new bufferpool. Make sure that the page size used is 32K. Check what data is intended to be pinned, thus the size matches the data with some additional safety buffer for future growth:

db2 create bufferpool indexbs immediate size 20000 pagesize 32K

Then you need to create a corresponding tablespace referring to the just created bufferpool and storing the table to be pinned

db2 "create large tablespace indexerspace pagesize 32K managed by database using (file 'indexerspace1' 10000, file 'indexerspace2' 10000) extentsize 32 bufferpool indexbs autoresize yes maxsize 20G"

After the artifacts have been created the table to be pinned needs to be moved to the new tablespace. We use here the ADMIN_MOVE_TABLE procedure. Be aware that a number of parameters are required and we use the minimum here (all parameters: tabschema, tabname, data_tbsp, index_tbsp, lob_tbsp, organize_by_clause, partkey_cols, data_part, coldef, target_tabname, options, operation - for details check the Db2 product documentation).

db2 "CALL SYSPROC.ADMIN_MOVE_TABLE( 'DB2INST1', 'TABLE_NAME_TO_REPLACE', 'INDEXERSPACE', 'INDEXERSPACE', 'INDEXERSPACE', '', '', '', '', '', 'MOVE')  "

preload data into bufferpool (TABLE_NAME_TO_REPLACE needs to be replaced by the table name which is considered to be put in memory):
db2 "select * from TABLE_NAME_TO_REPLACE" > /dev/null

The process can be more complicated when foreign key constraints exist on the tables. These need to be recreated when such a table is affected. It is advisable to document the required steps in advance to not forget any constraint or extend a corresponding downtime of the system. 
To check what tables have a foreign key constraint you can run a command like this ( needs to be replaced with the table name to be pinned): db2 "select substr(TABNAME,1,20) tab, substr(REFTABNAME,1,20) reftab from syscat.references where TABNAME='

Conclusion: Usage of in-memory pinning for Db2 was already used and showed good results on database utilization and performance improvements especially in contention scenarios.

2.3.2 Oracle

Before applying in memory pinning of tables, one needs to check that sufficient memory is available. This can be for example done by the command shared below (replace the three occurrences of the 'table_name_to_replace - 3 occurrences - with the table you want to pin in memory. Be aware that some safety buffer for data increase needs to be added as well - the actual setting depends on local policies, for CPU one cold stay in a 60% to 70% usage window, similar values can be used for memory as well, but this depends and best would be monitored to react on significant changes in both directions.):   select sum(bytes) from (select bytes from dba_segments where segment_name='table_name_to_replace' union select bytes from dba_segments where segment_name in (select segment_name from dba_lobs where table_name='table_name_to_replace') union select bytes from dba_segments where segment_name in (select index_name from dba_lobs where table_name='table_name_to_replace') );

When this data is available you can plan what method you want to use to pin the tables to memory (we did not do a comparison measurement for these methods, however it is highly recommended to test the behavior in any case with a testing environment before production usage).  

a) Traditional way with KEEP Pool

Ensure that DB_KEEP_CACHE_SIZE is large enough based on the table size plus some safety buffer for future growth.

Then run the following command against the table name (table_name_to_replace) which should be pinned into memory: 
alter table table_name_to_replace storage (buffer_pool keep);
If you should decide to revert the change and remove the table from cache, this can be done with this command:
alter table table_name_to_replace storage (buffer_pool default);
Further details on the keep cache can be found in the Oracle's own performance guide: https://docs.oracle.com/en/database/oracle/oracle-database/21/tgdba/tuning-system-global-area.html#GUID-4FD3AFB7-BFC5-49F2-8C6D-917E9B30D045
There is a second buffer pool named RECYCLE Pool, which can be used in such a context as well, the size can be configured by DB_RECYCLE_CACHE_SIZE. You also find it in the above Oracle Performance guide. Unfortunately there seems to be no further new buffer pool configurable.

b) An alternative way is via the in-memory option, however as this requires an additional license, I will skip it here. More details can be found in the book "Oracle Database 12c Release In-Memory - Tips and Techniques for Maximum Performance".

3. Cleanup and administration considerations

3.1 Administration considerations

It is advisable to plan ahead. If you have a new application going to be deployed, you can already estimate how many end-users might access and use corresponding new searchable variables. Any case-insensitive search with a large number of executions should already prepare for creating function-based indexes (Oracle) or expression-based indexes (Db2). 

3.2 How to check without Database Administrator (DBA) assistance process instances numbers

To know the current house keeping status one sometimes want to know the number of process instances, snapshots, etc. in the system. The easiest way is in the Process Admin console under the "System Maintenance Status". There are central statistics listed to get a general overview of the system. More direct SQL statements can be found on my github page: https://github.com/stephan-volz/database-scripts, some queries can require advanced permissions, but most should run with the datasource user.

3.3 General cleanup considerations

When we are talking about cleanup this normally will affect a larger number of process instances. Warning: Database maintenance after large delete operations is required to achieve maximum benefit. What does this mean? The large deletion of process instances will free up space on the database pages. However for data you would still read the whole page although there is newly recreated free space. Thus it would make sense to compact the still existing data on less pages, this can be done by some database reorg operations depending on your database system and should be triggered after large scale delete operations. Warning: Be aware that on all database systems the clean-up of BLOB data space requires special database methods, this space will not be freed by the normal reorg operation and in some rare cases can result in large amount of disk space being wasted, sometimes 10th of Gigabytes.

3.4 Cleanup only one vs. batch

When we assume that you can not directly delete data based on compliance requirements, the intuitive approach would be to pick a completed process instance, verify with some back-end system that it really can be deleted and then finally trigger the deletion call. Although this procedure makes a lot of sense and will work from a functional perspective, it has some performance drawback. For each of these process instances you need to open a new transaction, get a new database connection and so on. Thus this will result in a lot of overhead which can get an issue, when you not only do this one time, but 10k times. As in the off-shift hours normally also backups are scheduled the time for the cleanup operation is limited. Thus think about the design and if a deletion list can reach the same result in a shorter amount of time. The product itself with the newest APIs or command line tools should be able to clean the system rather quickly, when used in the right way.

Final remarks

I hope the shared content was useful and could help you in your daily business. If you think I forgot something feel free to contact me. Enjoy your product.

0 comments
33 views

Permalink