Figure 3: BDI 100TB 100-Simple user Run Performance
This second concurrent scenario has 100 users each running a set of 70 simple queries concurrently. Db2 on IBM’s P10 PCR shows a factor 3x performance improvement over the previous generation IIAS.
Scenario #3: 20-Intermediate User Test
Figure 4: BDI 100TB 20-intermediate user Run Performance
This next concurrency scenario has 20 clients running the 25 intermediate queries each and in this case Db2 on IBM’s P10 PCR achieves a 7.78x performance improvement over the previous generation IIAS! The improvement margin is much higher than the simple user run and is attributed to not only the increased core count but also the bigger available Db2 memory working set which allows more operations to be run in memory and avoiding use of partial result temp tables at this high scale factor which impacts the IIAS much more.
Scenario #4: 5-Complex User Test
The last query concurrency test features 5 clients each running the 5 most complex queries in the workload. In this case Db2 on IBM’s P10 PCR shows a performance improvement of factor 3x over the previous generation IIAS! Temp spilling can’t be avoided at this scale factor, but it is partially reduced, and the storage is faster on the P10 PCR which aids Db2 in faring much better than the predecessor solution.
Figure 5: BDI 100TB: 5-complex user Run Performance
The NMON system utilization graphs displayed in the image below illustrate how the five BDI concurrency user tests impose varying levels of stress on different system resources. These graphs provide insights into CPU, disk I/O, and network usage patterns, illustrating which components are most impacted under a specific concurrent workload test scenario.
Figure 6: P10 PCR System Utilization during BDI 100TB Concurrency Runs
Performance Test Set #3: Insert, Update, Delete, and Miscellaneous
Figure 7: BDI 100TB IUD Performance
This next set of tests target write-heavy operations like insert, update, and delete operations that require write IOs to storage. Insert sub-select (2 variations, one with predicate, one without) of approximately 10.7B rows is improved by ~58% to ~76%. The update operation is improved by over 44% and the delete operation is improved by over 13%. Finally, the truncate table operation is improved by over 90%. Various factors at play here including the superior storage and the new default Db2 partition layout (more logical nodes per server) help these operations over the predecessor solution.
One other notable test result is the “Unload Table Partition ALL” test. This is where every partition dumps out its data into an external table flat file simultaneously. Db2 on IBM’s P10 PCR finished this operation on 100TB in about 25 minutes, while on IIAS it took 1hr 37m. This represents an almost 4x speedup!
Sizing and Configurations
The Db2 Warehouse on IBM’s P10 PCR solution is offered in various sizes including multiple rack configurations to fit your IIAS replacement needs or your new modern data warehouse needs.
The performance comparisons earlier in this paper showed the characteristics between a full rack IIAS and the full rack (BRL) Db2 Warehouse on IBM’s P10 PCR. As noted in the charts the performance improvements for the Db2 Warehouse on IBM's P10 PCR solution seen were anywhere from 13% (slightly better) to almost 8x (a lot better) than the IIAS with most operations observed being between 2 to 3x factor.
Certainly the performance observed in our lab can help in sizing considerations, but performance characteristics are not linear when it comes to data warehouse processing. Many factors such as relative storage performance, amount of memory for data buffer, optimizer access plans and engine working set can play outsized roles in determining the performance. Great care should be taken in sizing, especially if a smaller form factor Db2 Warehouse on IBM’s P10 PCR is considered from a bigger form factor IIAS.
The basic form factor equivalency
IIAS Form Factors
|
Db2 Warehouse on P10CR Form Factors
|
M4002-003
|
Base Rack Small (BRS)
|
M4002-006
|
Base Rack Medium (BRM)
|
M4002-010
|
Base Rack Large (BRL)
|
M4002-020
|
Base Rack Large + Expansion Rack Large (BRL+ERL)
|
Keeping with the form factor equivalency will give you the confidence that your workload will improve. Going with a smaller form factor than before may also see performance being maintained or even improved, but this should be validated with your own validation testing and in consultation with your IBM representative.
Scaling of Bulk Ingest Operations
The first comparison will be a build of a ~100TB database from raw csv files split and pre-partitioned by the number of expected Db2 logical nodes in the configuration.
Operation
|
Base Rack Large (BRL)
|
Base Rack Large +
Expansion Rack Large (BRL+ERL)
|
Improvement
Factor
|
External Table Load Partition ALL 100TB (from raw CSV data files)
|
1hr, 32m, 21s (65 TB/hr)
|
0hr, 34m, 43s (173 TB/hr)
|
2.66x
|
Runstats
(statistic collection)
|
1hr, 13m, 46s
|
0hr, 29m, 12s
|
2.53x
|
For both the pre-partitioned load and subsequent statistics collection improved by a factor over 2.5x going from a base rack large (BRL) configuration to a base rack large plus expansion rack large (BRL+ERL). These are linear operations and are in line with the compute resource increase and demonstrate almost perfect scaling.
Once the 100TB database was built, a similar set of test operations were run just like in the IIAS vs Db2 Warehouse on IBM's P10 PCR (Base Rack Large).
Figure 8: BDI 100TB Scalability Performance BRL to BRL+ERL
All operations observed improved performance with most seeing between 2 to 3x improvement. With these set of tests at these volumes we are confident that the scaling capabilities of the Db2 Warehouse on IBM’s P10 PCR solution will meet the needs of the most demanding workloads.
Conclusion and Final Thoughts
The IBM Db2 12.1 Warehouse on IBM’s P10 PCR solution is the latest offering available to meet the most demanding data warehouse workloads. It is a reference architecture built upon the trusted and foundational pieces of IBM Db2 12.1 and the IBM Power 10 processor and Flash System storage that is prescribed and validated by IBM. This solution reference architecture represents a significant upgrade over the previous IBM Integrated Analytics System (IIAS) and PDOA appliances and will have a familiar feel. Designed for heavy analytics and operational analytics workloads, it also has built-in high availability, and variety of monitoring and disaster recovery (DR) options.
It has been tested extensively for performance and integration both as a comparison to IIAS and in scaling from BRL to BRL+ERL with improvements in operations measured from 13% to almost 8x with excellent scaling.
Welcome to the next generation of Data Warehousing!
About the Authors
Peter Kokosielis is the manager of Db2 Performance Quality Assurance, Db2 Warehouse on P10CR QA and Big Data and Data Virtualization QA. He has extensive experience at IBM in Db2 LUW database performance both in OLTP and Data Warehouse settings along with deep experience in platform exploitation on Power and Intel based processing architectures, hardware accelerators, virtualization and operating systems.
Jana Wong is the principal performance focal for Data Warehouse on-premise solutions at the IBM Silicon Valley Lab, with over 15 years of experience in Databases, SQL, QA, and Project Management. She holds a Master’s in Computer Science from the University of Rostock. Recently, she led the development and automation of a benchmark kit for validating IBM's Power10 Private Cloud Rack and played a key role in evaluating the performance of reference architectures such as IIAS/Sailfish and P10 PCR. Jana can be reached at jfitzge@us.ibm.com.