This article is the fifth in a series about the enhancements delivered in IBM DB2 12, and focuses on improved features that improve performance for analytical queries. Read part one, part two, part three and part four.
Many new applications are focused on analytics and are characterized by more use of sort, joins, stage 2 predicates and other more complex SQL patterns. This article looks at how DB2 12 for z/OS enables these complex applications by improving their performance. I start by outlining the high-level objectives for DB2 12, the performance improvements compared to DB2 11 for the different types of workloads tested, and move on to discuss the technical details.
High-level Performance Objectives for Complex SQL Applications
Query performance continues to increase in importance to customers, as they seek cost-effective ways to discover valuable information hidden in the vast amounts of their business and operational data. Additionally, improved analytical query performance enables them to make business decisions faster and at less cost.
DB2 12 has over twice the number of performance enhancements in this area than DB2 11, which was itself known for impressive query performance improvements. Many of the enhancements are targeted at SQL constructs seen both in new analytical and in complex transactional workloads.
To summarize the three high-level objectives for analytical queries in this release of DB2:
- Up to a 25 percent CPU improvement for traditional query workloads through optimizations for DISTINCT, GROUP BY, reduced work-file usage, multiple index access and list prefetch
- Up to a two times improvement for modern SQL applications, focusing on performance improvements for next generation SAP applications, for real-time analytics and for complex online transaction processing (OLTP) workloads. These optimizations are related to outer join, UNION ALL, stage 2 join predicates, CASE expressions, VARBINARY datatype indexability, DECFLOAT datatype indexability and others.
- Making parallel query child tasks 100 percent IBM z Integrated Information Processor (zIIP) eligible. In previous releases, there was a complicated formula to determine which parts of the parallel query were eligible for zIIP offload. In DB2 12 this becomes much easier with all child tasks associated with the queries being zIIP eligible.
It’s important to note that modern applications contain more complex SQL patterns, which are specifically targeted in DB2 12. They tend to use more sorting, joins, stage 2 predicates and others. These complex patterns are less common in traditional OLTP and batch workloads.
Query Workload CPU and Elapsed Time Improvement from V11
Many DB2 customers help DB2 development to profile various types of customer workloads. The profiles are then used to build workloads to stress test the performance enhancements in each release. These workloads are used to evaluate the performance improvements introduced by new DB2 releases, and for DB2 12 the performance results were compared with those in DB2 11. Db2 for z/OS development also uses standard industry workloads (e.g., SAP) and standard benchmark workloads to evaluate SQL performance. The chart below demonstrates the variability of the DB2 12 performance improvements across different workload types.
The percentage improvement in CPU and elapsed time for workloads involving simple queries or large data scans is relatively small compared to DB2 11. On the other hand, for query workloads involving complex reporting and large sorts, the CPU and elapsed time improvements are much more significant. Very impressive savings are seen for query workloads involving UNION ALL with views, complex outer joins, User Defined Functions (UDF) and so on.
When looking at these figures, you should always bear in mind that these results are isolated measurements and may not be representative of your actual performance results, because of the large degree of variability in real customer workloads.
High-level Performance Focus
DB2 development has focused on two SQL query areas for high-level performance targets:
- Complex query workloads typical of newer applications, including but not restricted to the analytical workloads so important to customers
- More general query performance bottlenecks
The query patterns used by newer workloads tend to use complex views or table UDFs, including functions and operators like UNION ALL, outer joins, and join predicates involving stage 2 predicates. They also tend to use CASE expressions, CAST functions and scalar functions. These query patterns are specifically targeted in DB2 12 for performance improvement.
The SQL UNION ALL construct is used extensively by DB2 itself in system temporal implementations, and in transparent archive. It’s also widely used in new analytics workloads such as SAP Fiori and SAP Simple Financials, and in transactional workloads such as WebSphere Portal and WebSphere Business Process Server, among others. Usefully, it turns out that UNION ALL and outer join share many of the same performance challenges and solutions. These newer workloads also make extensive use of outer joins and other complex query SQL operators and constructs such as CASE expressions, CAST functions and scalar functions.
For general query performance bottlenecks that impact all complex workloads, IBM has focused on reducing sort/work-file usage and therefore contention for prefetch engines, reducing the cost and frequency of PREPARE, and in the area of I/O performance, avoiding the unnecessary scheduling of prefetch engines.
To try to reduce the cost of PREPARE, DB2 development has targeted the case where a table has many indexes defined on it, as this is one of the most expensive query types to PREPARE. To reduce the frequency of PREPARE, as discussed in the previous article
, DB2 12 introduced Dynamic Plan Stability—the capabilities to capture dynamic SQL statement text and runtimes in the Dynamic Statement Cache (DSC) and store them in the DB2 catalog, and to load them from the catalog into the DSC. PREPARE frequency is further reduced by avoiding Global DSC statement invalidation by utilities.
Changes to the RDS will enable customers to deploy and run newer applications that deliver competitive-edge business functions more cost effectively.
Query Performance Focus
To focus in more detail on query performance, we look at four major topics:
1. Improving the performance of UNION ALL and outer join
2. Improving the performance of table UDFs
3. Adaptive Index
4. Other DB2 12 Query Performance Enhancements.
1. Improving the performance of UNION ALL and outer join
UNION ALL and outer joins share similar performance challenges. Improvements to these SQL constructs have largely been realized by avoiding materialization and by avoiding sort. Materialization can result in significant performance degradation and increased work-file resource usage if filtering is not applied before the materialization. The first challenge, therefore, is to address excessive work-file usage due to materialization, and the second is to address the lack of ability to apply predicates earlier.
To reduce materializations and its cost when still required, DB2 12 introduces several changes. DB2 12:
2. Improving table UDF performance
- Reduces the number of situations where work-file usage is required for materialized outer join query blocks or UNION ALL legs. This will reduce CPU and, possibly, I/O overhead.
- Trims columns in a materialized view or table expression that aren’t required by the outer query. Db2 also prunes unique LEFT OUTER JOIN views and table expressions if the columns in those views and table expressions aren’t included in the SELECT list. Pruning of unique LEFT OUTER JOIN tables was already delivered in versions 10 to12 extends that to views and table expressions.
- Pushes predicates into UNION ALL legs or OUTER JOIN query blocks if it’s cost effective for the optimizer to do so
- Pushes ORDER BY and FETCH FIRST into UNION ALL legs
- Reorders the OUTER JOIN tables to avoid materializations if it is cost effective for the optimizer to do so
Where possible, DB2 12 enhances table UDFs to have similar merge capabilities as views, avoiding materialization. It also enables the indexability of join and correlation predicates that are passed into the table UDF as a parameter, reducing the need for sort.
3. Adaptive index
Another major enhancement is the execution-time adaptive index, usually simply referred to as adaptive index. This is a solution for generic search-type queries. These present a challenge for any query optimizer because the degree of filtering of the predicates changes on each and every execution. This makes it impossible for the optimizer to choose the single best access path for all executions—the best access path depends on how filtering each predicate is.
Adaptive index is a DB2 12 enhancement to multi-index and single index list prefetch-based access plans that introduces logic to determine the filtering of each index at execution time. This ensures the optimal execution sequence of indexes, or earlier reversion to a tablespace scan if no filtering indexes exist.
There are two parts to the solution. First, DB2 12 allows RID-based access plans (i.e.,access plans using single index list prefetch or multiple index access) to quickly determine the filtering of the predicates from the index at run-time without requiring the BIND option REOPT(ALWAYS), and to adjust the access path based on the degree of filtering of the predicates.
In the cases of list prefetch and multiple index ORing, this provides an early opportunity to fall back to tablespace scan if a large percentage of the table must be read even when using index access.
In the case of multiple index ANDing, DB2 can consider dynamically reordering the sequence in which the indexes are accessed from most to least filtering. Db2 can optionally eliminate non-filtering indexes stages (known as early-out), or fall back to a table space scan if there is no filtering.
The second part of the solution is to use the degree of uncertainty to determine the risk of a single index plan. A quick evaluation is performed, based upon the host variables used, and any further evaluation of filtering is deferred until after one RID block is retrieved. This ensures that very short running queries don’t incur evaluation overhead.
To provide some more detail, the optimizer will consider a multi-index or list prefetch plan as in prior DB2 releases, but in DB2 12, an additional consideration is the uncertainty associated with the filter factors—the estimated degree of filtering of each predicate. For example, a predicate such as WHERE COL1 < ? has a high degree of uncertainty because the value specified at execution may qualify all of the rows, or none of the rows. If a high degree of uncertainty exists for the indexes chosen, then the optimizer may add additional indexes for a multi-index access access plan.
Then, at execution time, DB2 re-evaluates the estimated filtering for each leg in the multi-index plan or single leg of a list prefetch plan once the literal values are known, to determine if the index legs should be reordered, or if index legs should be discarded or the plan should revert to a tablespace scan.
This enhancement applies to any multi-index or single index list prefetch plan, regardless of whether the optimizer determined there was high uncertainty to the predicate filtering or not. A rebind is required to take advantage of this feature for static SQL.
The following is a simple example of the targeted use case, a generic search query:
SELECT * FROM TAB1 WHERE COL1 < ? AND COL2 < ? AND COL3 < ?;
The query retrieves all the columns from the table TAB1, and the search is based on the values in three columns, COL1, COL2 and COL3. These are range predicates, because in each case the less-than operator causes DB2 to qualify rows where the column has a value less than the host variable referenced by a parameter marker. The table has three single-column indexes: IX1 on COL1, IX2 on COL2 and IX3 on COL3.
Filtering of the above query is entirely dependent on the host variable values for the parameter markers at execution time. This is a common pattern for search screens that generate many range predicates such as BETWEEN, LIKE, less than and so on. It’s also common that one index provides good filtering unless a highly skewed value is searched.
This is a good example of a scenario where the Db2 optimizer can recognize that each WHERE clause predicate has a high degree of uncertainty associated with its filtering estimate, and that the indexes for the query are viable candidates for a multi-index access plan. Predicate patterns seen in search screens are good candidates for multi-index access and will benefit from the adaptive index enhancement, where the degree of filtering of the predicates can potentially change at each execution.
As an aside, this feature isn’t intended to encourage the use of many single-column indexes a table. It’s much better to have a smaller number of carefully chosen multi-column indexes.
4. Other DB2 12 Query Performance Enhancements
DB2 12 introduces several other query performance enhancements.
The first of these relates to join predicates with stage 2 expressions. Typically, these involve arithmetic and scalar functions such as SUBSTR, DATE, INTEGER and CHAR. The solution in DB2 12 is to calculate the result of the expression before the sort, and then to sort on the expression. This allows merge scan join and sparse index to be used with many stage 2 join predicates.
Previously, when using predicates on columns with VARBINARY and BINARY data types, and the length of the operands did not match, DB2 would make these predicates stage 2 (i.e., non-indexable). DB2 12 implicitly uses CAST on these columns, making them stage 1 and indexable.
The VARBINARY data type allows matching index access in DB2 12, and any scalar functions that returns VARBINARY (e.g., COLLATION_KEY) can now use Index on Expression.
Expression evaluation for CASE and SUBSTR has been optimized to improve execution performance.
Expressions (such as CASE expressions, UDFs and scalar functions) that are duplicated in the SELECT list because of a view or table expression merge are now only executed once and the result is shared by all references to that expression. This only applies if the duplication was due to a DB2 merge, and not if the SQL was originally coded with duplicate expressions.
UDFs defined as DETERMINISTIC will now have their execution results cached within the life of a single statement, such that repeat calls with the same input values can return the result from the cache rather than re-executing the UDF.
To enable more use of parallelism, 100 percent zIIP offload is available for parallel child tasks; optimizations have been made to improve the access path choices available for parallelism; and work is more efficiently distributed across the child tasks, reducing cost and resource consumption.
The enhancements to sort include:
Improvements for Modern Applications
- Improved performance for GROUP BY and DISTINCT, to improve duplicate removal as input to sort
- Reducing work-file usage for GROUP BY and DISTINCT. The previous maximum number of nodes in the sort tree was 32,000, and has been raised to 512,000. This reduces the number of merge-passes in the sort.
- Reducing the key length for GROUP BY and DISTINCT and for sparse index. As a result, sorts involving fixed length columns may see a reduction in work-file usage where the sort key and the data share the same columns.
- Continuing the progress towards more use of in-memory sort for smaller sorts that began in DB2 9. This has been extended to intermediate sorts for all types of query blocks.
In this article, we looked at how DB2 12 delivered performance improvements targeted at modern applications using more complex SQL statements. These improvements included optimizations for features like GROUP BY, UNION ALL, reduced resource usage, particularly for statements which sort data. DB2 12 also delivered run-time optimization with the introduction of the adaptive index. By happy coincidence, many of these improvements also benefit traditional analytical applications.
In the next article, we look at how DB2 12 enables modern application development paradigms by providing support, for example, for RESTful service and JSON data formats.
Gareth Z. Jones has worked in IT since 1985. Until 2000, he was an IBM client, with experience as a systems programmer and DBA. He now works in DB2 for z/OS development as a member of the SWAT Team, which is led by John Campbell. He has worked with many customers around the world to help them succeed in their use of DB2. Gareth has written several technical papers and presented at many conferences and group meetings. He can be contacted via email at firstname.lastname@example.org.