Significant Performance Improvements
with AI Join Cardinality Estimates
By Lori Strain, Tam Minh Tran and Nicholas Ostan
We are excited to announce another significant advancement in the Db2 AI Query Optimizer: join cardinality prediction. This new capability allows the optimizer to accurately predict cardinalities of equality joins between tables accounting for correlation between local and join predicates. While mileage may vary, tests on the AI Query Optimizer in Db2 version 12.1.3 have shown significant query performance improvement with some individual queries executing 100 times faster! AI improves the overall performance of all 99 queries in the TPC-DS Benchmark by 17%.
In our previous blog posts and videos (1, 2, 3, 4, 5, 6) we've explored how the Db2 AI Query Optimizer revolutionizes database query optimization using neural networks. We've discussed our approach to gradually infuse AI into the optimizer and our goals including:
- reliable and stable query performance,
- minimization of resource requirement and making query optimization simple,
- the importance and challenges of cardinality estimation and how neural networks are used to improve cardinality estimation for single tables after applying local predicates
In this blog post, we'll review the concept of cardinality estimation and why it is critical to query performance and delve into how the Db2 AI Query Optimizer not only significantly improves join cardinality predictions, access plans and thus query performance without needing to train any new models but also reduces the need for query tuning.
Optimal access plans are key to query performance and resource consumption. An access plan is a sequence of operators, such as index scans, joins or group bys, used to compute the results of a query. Cardinality is the number of rows produced by a query or an operator in an access plan. AI single table cardinality estimation has improved the optimizer’s ability to select efficient join orders. AI join cardinality estimation takes this to the next level. For example, it is more efficient to execute joins which produce fewer rows before joins which generate more rows. Join cardinality estimates are also key to making decisions regarding operations further up the plan. Query performance is significantly improved when large number of rows produced by a join are not repartitioned for a subsequent join or used to generate hash tables for subsequent joins which cannot be contained in memory.
Traditional query optimizers rely on basic statistics and simplifying assumptions for join cardinality estimation. The simplifying assumptions typically involve:
- Independence: Assuming that column values in different tables are independent of each other
- Uniformity: Assuming uniform distribution of values
- Inclusion: All of the values in the join column from the table with the smaller number of distinct values are in the join column from the table with the larger number of distinct values.
A traditional optimizer might estimate the number of rows (cardinality) that satisfy the local predicates on the tables to be joined and then apply a formula based on distinct values in the join column to estimate the join cardinality such as:
cardinality of one table after applying local predicates
x cardinality of second table after applying local predicates
x 1/maximum number of distinct values in join column
SELECT * FROM Orders O JOIN Customers C ON O.custid = C.custid
WHERE O.orderdate >= '2022-01-01' AND C.region = 'NA'
3 of the 10 years or 30% of 54,995 or 16,498.5 rows would be expected to satisfy the local predicate on orders. Similarly, the cardinality estimate after applying the local predicate on region would be 500 and the join cardinality estimate would be:
(54,995 x .3) * (1000 x .5) x 1/1000 = 8249
The traditional approach misses critical insights:
- Customers in the NA region might place orders at different frequencies than customers in other regions
- Recent orders might have a different customer distribution than historical orders
- The combination of these predicates might create a distribution pattern that isn't captured by statistics for individual tables.
The actual cardinality of the query is 25,524. The traditional optimizer estimate is off by 68%.
By leveraging AI, Db2 can understand more complex relationships between data across tables such as correlation and non-uniformity and therefore, compute better cardinality estimates. The AI optimizer’s cardinality estimate for the example above is 25,859. It is off by about one percent.
Db2 version 12.1.3 does not require new AI models to be trained. It uses single table cardinality models available since 12.1 to populate a histogram for each of the tables being joined. The histograms represent the distribution of data in the table after applying the local predicates. The fact that the histograms represent the data distribution after the application of the local predicates is key to understanding the degree of correlation between the join predicate and the local predicates.
The following histogram is generated for the orders table in the example above.
The y-axis of the histograms represents the number of rows from the tables that satisfy the local predicates and have values in the join column which fall between the lower and upper bounds of the histogram bucket boundaries on the x-axis. The cardinality of each histogram bucket is the AI cardinality estimate for the following single table query, where x is the customer id at the lower bound of the histogram bucket and y is the value of customer Id at the upper bound of the bucket.
where orderdate >= '2022-01-01'
and custid between x and y
A second histogram is generated for the customer table. The intersection of the two histograms can then be computed generating insight into relationships between data in the orders table and data in the customer table. The result is a histogram for the resulting join distribution. The total of all the buckets for the join distribution is the estimated join cardinality. The AI cardinality estimate is 25,859 which is only one percent above the actual cardinality of 25,524.
The next section summaries our results for tests comparing the relative accuracy of the new AI join cardinality estimates to the traditional optimizer’s estimates.
Improved Cardinality Estimates
The following box plots depict the errors associated with cardinality estimates for thousands of automatically generated join queries. The plots compare the error in the cardinality estimates for the AI optimizer on the left compared with the traditional optimizer on the right. The middle box shows the error for the traditional optimizer when using automatically generated group column statistics. Group column statistics capture correlation information for pairs of columns by recording the number of distinct combinations of values. An error of 0 signifies a perfect cardinality estimate, whereas an error of 1 means the estimate is 1 order of magnitude different than the actual. The error is negative if the estimate is too low and positive if it is too high. The plots represent joins on tables in the TPC-DS schema. The one on the left represents joins between the item and store_sales tables and the one on the right joins between the catalog_sales and date_dim tables. The join predicates for each pair of tables are always the same. Local predicates are randomly generated. The queries are randomly generated with local predicates on 1 or more columns defined in the model, which can include equality, between, and IN. Note that the error in the AI estimates is tightly clustered around 0 and that even with group column statistics, the traditional estimates are often much further from the actual cardinality and tend to be too low. Also, the error in the traditional estimates varies significantly more from one query to the next. The impact of cardinality estimation errors on plan selection and thus performance is amplified when the relative error for 2 joins in the same query is significantly different e.g. when a query joins store_sales to both item and date_dim and the cardinality estimate for the join between store_sales and date_dim is much lower relative to the actual cardinality than the estimate for the join between store_sales and item.
The next sections explain in detail how more accurate join cardinality estimates lead to superior plans and significant improvements in performance.
The total execution time for all 99 standard queries in the 3TB TPC-DS Benchmark is reduced by 17 percent using AI cardinality estimation in Db2 version 12.1.3. This includes 70 to almost 90 percent improvement for queries 22,72 and 84.
SAP BW Benchmark Query Execution Times Improve
Internal testing showed the AI Query optimizer improved the execution times for 4 of 22 queries by 44 to 67% and did not cause any queries to regress.
Superior Cardinality Estimates, Plans and Performance
Improved cardinality estimates result in better plans which deliver superior performance. This section explains in detail how improved AI cardinality estimates impact the plan and performance of TPC-DS query 84. Similar improvements in join cardinality estimates result in partial early aggregation (PEA) and early out join plans which improve the performance of queries 22 and 72.
The following are annotated fragments of the traditional and AI plans for TPC-DS query 84:
Select c_customer_id as customer_id,
coalesce(c_last_name, '') || ', ' || coalesce(c_first_name, '') as customername
ca_city = 'Greenfield' and
c_current_addr_sk = ca_address_sk and
ib_lower_bound >= 11660 and
ib_upper_bound <= 11660 + 50000 and
ib_income_band_sk = hd_income_band_sk and
cd_demo_sk = c_current_cdemo_sk and
hd_demo_sk = c_current_hdemo_sk and
order by c_customer_id fetch first 100 rows only
The following table summarizes the traditional and AI estimates for the cardinality of key joins along with the actual cardinality and relative errors.
|
|
|
|
|
|
|
|
|
|
|
|
Traditional estimate as % of actual
|
|
|
|
|
|
|
|
|
|
|
Unlike the traditional optimizer, the AI optimizer accounts for the impact of the local customer address city predicate on the selectivity of the join predicate between customer_address and customer. The error in the traditional estimate for HSJOIN(12) propagates to HSJOIN(9). As a result, the traditional plan contains two joins (HSJOIN(9) and HSJOIN(8)) with more than 10 million rows on the probe side. The AI optimizer recognizes that HSJOIN(9) produces significantly more rows so it avoids this join. Instead, the AI optimizer chooses to join store_returns last as shown below. Unlike the traditional plan, the AI performs only a single join with more than 10 million rows on the probe side. HSJOIN(11) in the AI plan is small and therefore quick relative to the HSJOINs in the original plan. The following chart summaries the join costs for the AI and traditional plans.
|
|
|
|
|
|
|
|
|
|
|
|
Join with household_demographics
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
100 Times Performance Improvements
With AI cardinality estimates, some queries execute 100 times faster. For example, the following query runs 120 times faster using AI. It is a custom query run on a 3TB TPC-DS database. It takes 5.12 seconds using AI compared to 623.11 seconds without AI.
c_customer_id as customer_id,
coalesce(c_last_name, '') || ', ' || coalesce(c_first_name, '') as customername
ca_city = 'Greenfield' and
c_current_addr_sk = ca_address_sk and
ib_lower_bound >= 11660 and
ib_upper_bound <= 11660 + 50000 and
ib_income_band_sk = hd_income_band_sk and
cd_demo_sk = c_current_cdemo_sk and
hd_demo_sk = c_current_hdemo_sk and
sr_cdemo_sk = cd_demo_sk and
sr_item_sk = i_item_sk and
inv_item_sk = i_item_sk and
inv_date_sk = d_date_sk selectivity 2.818459e-05 and
d_month_seq between 1193 and 2000
group by c_customer_id, c_last_name, c_first_name
fetch first 100 rows only
The portions of the traditional and the AI plans are shown below. There are two differences between the plans. The difference at the bottom of the plan, shown in less detail, greyed out and in a smaller font is the same as the plan difference for query 84 discussed above and it reduces the execution time by ~40%. The more significant plan difference is the reversal of the probe and build side of the topmost hash join. Swapping the probe and build sides of HSJOIN(13) on its own reduces query execution time by 98% without any other plan changes.
The following table compares the ratios between the estimates and actuals of the traditional and AI estimates for the joins shown in plan fragments above. Even though the AI estimate is still significantly too low, it is improved enough to result in a far superior plan.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Traditional estimate as % of actual
|
|
|
|
|
|
|
|
|
AI estimate as % of actual
|
|
|
|
The key difference between the plans is that the AI plan is early out and the traditional plan is not. HSJOIN(13) can only be early out if the customer table is accessed on the outer because the only columns referenced above the join are from the customer table. The error in the cardinality estimate for HSJOIN(13) prevents the traditional optimizer from recognizing the true cost of the non-early out join. Like the PEA in query 22, the early out join improves runtime performance by reducing the number of rows output by the join and processed by the subsequent group by. The following table summarizes the differences between the traditional and AI plans. Note:
- the large difference between the traditional cost estimate and the AI cost estimate for the traditional plan highlighted in yellow. Cost estimates are heavily influenced by cardinality estimates. Therefore, like the traditional cardinality estimate, the traditional cost estimate is too low. Therefore, the traditional cost estimate for one plan should not be compared to the AI cost estimate for another to determine which plan will execute faster.
- the large difference in the AI costs of the group by in the two plans highlighted in green.
|
|
|
|
|
|
|
|
|
1.Customer on outer of HSJOIN(13)
|
|
|
|
|
|
|
3. Rows output by HSJOIN(13)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
DB2 version 12.1.3 reuses single table cardinality models already available in version 12 prior to mod pack 3 to improve join cardinality estimates, access plans and query performance. Predicting join cardinalities without any new models keeps the resources required by the Db2 AI Query Optimizer low. Delivering significantly improved query performance and minimizing resources are two of the goals of the DB2 AI Query Optimizer as stated in the introductory blog in this series.
In addition to significantly improving query performance while minimizing resources required, another goal of the DB2 AI Query Optimizer is to minimize the query tuning required to get more stable performance. Traditionally, statistical views have been used to improve query performance. However, defining such views and collecting statistics on them is a time consuming process. Without statistical views, the AI total execution time for all 99 queries in our 3TB TPC-DS test was 5% faster than the traditional execution time with statistical views as depicted in the following charts. The results with the AI optimizer are shown on the right compared with the traditional optimizer with statistical views in the middle. The bar on the left represents the untuned traditional optimizer discussed above. Not only can AI eliminate the need to spend time defining and implementing statistical views, it can also improve the performance of workloads which have already been tuned using statistical views. In our tests, AI improved the performance of queries 64 and 84 by 63 and 70 percent respectively when compared with the performance of the traditional optimizer with statistical views.
Furthermore, the extra 5% execution time did not include either the roughly 14 hours it took to collect statistics on the views or the approximately 40 hours it took a database expert to define them. The time required to collect statistics on them often prohibit customers from being able to use statistical views. In comparison, AI delivers better performance and takes only minutes to train. The AI training time for the 3TB TPC-DS database used in our tests was 3 minutes.
|
|
|
|
|
|
|
|
|
|
|
3 minutes (for model training)
|
|
|
|
|
- The runstats times in the table above do not include the time taken to collect statistics on the TPC-DS tables without model training which is 102 minutes for both statistical views and AI.
- The statistical views used were specifically tailored to the TPC-DS workload. Less time would be required to define general statistical views which might also be more useful for workloads as they evolve or for ad-hoc queries. However, general statistical views would have included significantly more tables and therefore, more time would have been required to collect statistics on them.
AI vs Column Group Statistics
At the time of writing this blog, multiple equality join predicates are not yet handled by the AI models. However, once this is done, collecting column group statistics (CGS) for join cardinality estimation will also not be required. Note that column group statistics are not needed for correcting local predicate estimations as of Db2 version 12.1
Experience the enhanced capabilities of the AI Query Optimizer by upgrading to Db2 version 12.1.3. If you're already using the Db2 AI Query Optimizer, you'll automatically benefit from the join cardinality enhancement. No additional configuration or tuning is required. The feature works seamlessly with your existing workloads. If you haven't yet enabled the Db2 AI Query Optimizer, now is the perfect time to do so. To enable AI query optimization simply set the database configuration parameter “AUTO_AI_OPTIMIZER” to “ON”. The addition of join cardinality prediction makes the Db2 AI Query Optimizer even more powerful and effective at improving query performance. These improvements translate directly to business benefits: faster insights, lower infrastructure costs, and reduced need for performance tuning expertise.
Lori Strain is an expert in Db2 Optimization. She has worked in database optimization since joining IBM in 1989, was instrumental in the development of the original cost model for Db2 version 2.0 and is very excited to be part of the team working to build the next generation of database optimization technology based on AI.
Tam Minh Dai Tran worked in the Db2 Optimizer from 1999 to 2017. Since then, she has been working in the Db2 Query Performance team on various projects in the areas of runtime and query compiler.
Nick Ostan is a software developer in the Db2 AI Optimizer group. Nick studied biochemistry and biophysics at the University of Toronto before transitioning into data science for biotechnology/drug discovery, and then ultimately towards pure deep learning.