By C Raghavendra and Toby Haynes
Performance tuning in Db2 LUW can be challenging, especially when database responsiveness degrades without an obvious root cause. This article aims to provide an overview to diagnosing and resolving performance issues by focusing on one of the most critical aspects of database performance: wait times.
As the first installment in a series on understanding and analyzing Db2 LUW wait time metrics, this post explains why wait times matter, reviews many types of database waits, and examines how to measure them effectively using modern tools (such as db2mon leveraging the MON_GET_* monitoring functions).
Whether you're investigating a sudden performance drop or proactively fine-tuning your environment, this series will help you uncover where time is being spent and how to improve performance.
In Db2 LUW performance analysis, understanding where time is spent is fundamental to effective tuning. Unlike higher-level metrics (for example, commit rate) which may only indicate that a performance problem exists, wait time metrics provide direct insight into delays that impact throughput and responsiveness. Wait time metrics are generally low-level measurements, closer to the system and database fundamentals such as storage performance or locking. They help pinpoint the layer where performance is hindered.
This table shows the many wait time metrics found in the MON_GET_DATABASE table function. This table has been ordered into groups by the area of the database engine that is being measured.
Wait Time Monitoring Metric
|
|
|
Total time spent waiting within the database server, while processing an activity
|
|
Total time spent waiting within the database server
|
Communicating between client and server
|
|
Application queued to wait for an agent under concentrator configurations
|
|
Waiting for the client to send its next request
|
|
Waiting for an incoming client request over TCP/IP excluding idle time
|
|
Blocking on a TCP/IP send to the client
|
|
Agent waiting to receive an incoming client request using the IPC communications protocol
|
|
Agenet blocking on an IPC send to the client
|
|
Waiting for the return from a communication exit library API function
|
Internal database engine operations
|
|
|
|
Agent waiting for space in the log buffer
|
|
Agent waiting for log records to be flushed to disk
|
|
Waiting for an I/O server (prefetcher) to finish loading pages into the buffer pool
|
|
Waiting for an I/O server (prefetcher) to read or write a LOB
|
TOTAL_EXTENDED_LATCH_WAIT_TIME
|
Waiting for a latch where the requester yielded CPU
|
|
AUDIT_FILE_WRITE_WAIT_TIME
|
Waiting to write an audit record
|
AUDIT_SUBSYSTEM_WAIT_TIME
|
Waiting for space in audit buffer
|
|
Agent waiting on a write to the db2diag log file
|
|
Agent waiting for an event monitor record to become available
|
|
|
Agent waiting for the external table readers to read & process data from external tables
|
|
Agent waiting for the sent data to be processed and written by the external table writers
|
|
Waiting for Federation server to run a statement and get results from a remote data source
|
Communication between agents
|
FCM_MESSAGE_RECV_WAIT_TIME
|
Agent waiting for an FCM reply message containing the results of a previously sent FCM request message
|
FCM_MESSAGE_SEND_WAIT_TIME
|
Agent blocking on an FCM message send
|
|
Waiting to receive data through FCM
|
|
Blocking on an FCM send operation
|
|
Waiting to receive the next buffer from a table queue
|
|
Waiting to send the next buffer through a table queue
|
|
Waiting to receive data from an in-database analytics process
|
|
Waiting to send data to an in-database analytics process
|
|
|
Agent communicating with the cluster caching facility
|
DATA_SHARING_REMOTE_LOCKWAIT_TIME
|
Remote applications waiting while the table transitions out of the NOT_SHARED data sharing state
|
|
Waiting on global lock waits
|
|
Agent waiting on page locks, where the lock request caused a page to be reclaimed
|
SPACEMAPPAGE_RECLAIM_WAIT_ TIME
|
Agent waiting on page locks for pages related to internally maintained object space management where the lock request caused a reclaim from another member
|
All of these monitoring counters are of BIGINT data type, and all corresponding columns report values in milliseconds.
Table 1: Wait Time Metrics available through MON_GET_DATABASE
Tools for Monitoring Performance
Tools like db2mon use the MON_GET_* table functions to offer visibility into many database metrics, enabling experienced database administrators to examine time spent actively processing database work and time waiting for one or more database and/or system resources. The MON_GET_* table functions provide broad coverage of many diverse database metrics, building on the high performance, low impact monitoring capabilities originally delivered in Db2 v9.7 and continuously extended through the current v12.1 release. The use of the old SNAPSHOT metrics is strongly discouraged, especially on databases driving substantial CPU workloads on highly parallel systems, where the performance impact of data collection can be severe.
Strategies for performance analysis
By focusing on areas of higher wait times, database teams can prioritize efforts where they will have the greatest impact, avoiding chasing metrics with complex dependencies on lower-level performance. This approach is applicable to top-down performance analysis, but it can also be applied at the SQL statement level for specific insights.
When addressing performance issues in Db2 LUW, it is essential to begin with a clear objective. Whether the goal is to restore previous performance levels, reduce CPU load, or identify the most critical bottleneck—such as disk, network, or CPU—understanding where time is being spent is the key to effective tuning.
Figure 1: Methodical Approach to Performance Analysis
Db2 provides a rich set of monitoring tools and interfaces that allow administrators to drill down into performance data and isolate the root causes of inefficiencies.
Monitoring table functions
At the core of Db2’s performance monitoring capabilities are the MON_GET_* table functions. These functions provide detailed metrics on various aspects of database activity, including time spent and the number of occurrences of specific events. These metrics are invaluable for identifying whether performance issues stem from excessive processing or from time spent waiting on resources. The table below summarize a few of the many.
|
|
|
Offers a high-level overview of database activity, including I/O, logging, and wait times.
|
|
Provides metrics at the workload level, useful for analyzing application behavior and resource usage.
|
|
Connection-level metrics, helpful in establishing whether all the connections in a workload class or application are performing and executing equivalently.
|
|
Detailed statistics on SQL statements found in the SQL package cache, covering row metrics and wait times.
|
|
Reports on buffer pool metrics, useful for calculating hit ratios and measuring read/write activity.
|
|
Delivers detailed statistics on tablespace usage, I/O performance, and wait times.
|
|
Provides insights into transaction log activity, including log write times and I/O statistics.
|
Table 2: Commonly used MON_GET* table functions.
Handling cumulative monitoring data
The MON_GET_* table functions are a detailed source of performance data in Db2 LUW. However, because these functions report cumulative metrics from the time the database was activated, the counters cannot be reset. Most DBAs are interested in the current activity, so we need to find the difference between two data collections taken over a defined interval to accurately assess activity during a specific period.
The technique used in db2mon involves capturing a baseline dataset using a global temporary table for each of the table functions, waiting for a short duration (typically under 300 seconds), and then subtracting the baseline from a new data collection. This approach is often referred to as “Delta values”, enables precise tracking of performance trends and helps isolate short-lived events that might otherwise go unnoticed in the cumulative data.
Note: The MON_GET_* table functions accumulate counts and time spent in BIGINT data types. For the busiest monitors, these values can exceed the maximum BIGINT value and will overflow back to a small value again. Collecting MON_GET_* table functions at longer intervals can expose more of these overflow events, leading to “odd” or negative values in the db2mon report. It is unlikely that a 30 second db2mon collection will include an overflow, but a 3000 second db2mon collection is at much higher risk.
Other performance utilities
An older but sometimes useful tool for quick performance analysis is monreport. dbsummary(). This stored procedure provides a high-level summary of some of the database performance metrics. By executing CALL monreport.dbsummary(60) users can generate a report that highlights some critical metrics over a 60-second interval. This includes values like CLIENT_IDLE_WAIT_TIME and TOTAL_RQST_TIME, which help determine whether delays are occurring inside Db2 or in external layers such as the application or network. However, monreport.dbsummary() does not include all wait categories, some of which are essential for deep-dive performance analysis.
The db2mon utility provides a complete breakdown of wait times under the DB#WAITT section (Database-wide scope), SQL#TOPWAIT (SQL statement scope). To explore how db2mon works and how to get started, refer to the article An Introduction to db2mon.
Before diving into internal Db2 server diagnostics, it is crucial to determine whether performance delays originate mostly within Db2 or more from external sources such as the application implementation, Db2 client, or network transport. By comparing the CLIENT_IDLE_WAIT_TIME (time Db2 spends waiting for the next request from the client after the previous request has been sent) with the TOTAL_RQST_TIME (time Db2 spends processing those requests on the server), we can assess where the majority of time seen by the end user is spent. If the idle wait time is more than five times the server-side request processing time, performance tuning will have the most impact concentrated outside the Db2 server (e.g. on network congestion or application proximity to the database server). In such cases, tuning efforts should shift toward optimizing the application logic, improving client-server communication, or addressing network latency. This distinction ensures that performance tuning is focused in the area that will most improve the performance from the user’s perspective.
Figure 2: Wait time outside of Db2
Digging into time spent on the database server
If the majority of time spent waiting is within the database server, the next step is to analyze how time is spent within the Db2 engine. The key question becomes:
The answer will determine the direction of further investigation. Fortunately, stored procedures like monreport.dbsummary() can provide a quick and accessible breakdown of time spent across various wait categories.
Figure 3: Wait within Db2 itself
If the report reveals that wait times account for a significant portion of the total request time, it strongly indicates that internal Db2 waits are the primary performance bottleneck. In such cases, a deeper dive into specific wait categories is warranted, and it is worth collecting db2mon reports.
Note: If the database is spending most of its time processing requests and the host is running above 70-80% CPU utilization, it may be worth investigating the SQL statements that are consuming the most CPU or provisioning more CPU for this host.
Introduction to Database Wait Categories
If the database request is spending a significant percentage waiting, the next step is to analyze the various wait categories that contribute to overall request latency.
Each wait category (e.g. disk I/O, locks, latches, log write, etc.) offers unique insights into system behavior and potential areas for improvement. By examining these categories individually, we can identify which category of wait is consuming the most time, assess what other information needs to be studied, and determine the appropriate corrective actions.
The following table provides a structured overview of key wait categories, how to measure them, what elevated values might indicate, and recommended steps to mitigate their impact.
|
What the Wait Time Represents
|
|
What High Values Might Indicate
|
|
|
Time spent reading pages into the buffer pool.
|
Use MON_GET_TABLESPACE to check average read time.
|
Indicates storage performance issues or excessive I/O.
|
Optimize high-I/O statements; increase buffer pool size.
|
|
Time spent writing log records to disk.
|
Divide log_write_time by num_log_write_io from MON_GET_TRANSACTION_LOG.
|
Suggests slow log device performance.
|
Ensure log write time is under 2–4 ms; optimize log device throughput.
|
|
Time spent waiting for HADR log transaction synchronization.
|
Compare average log disk wait time (MON_GET_DATABASE) with HADR wait time (MON_GET_HADR).
|
Indicates network latency or standby log performance issues.
|
Improve network performance; check standby log I/O.
|
|
Time spent on direct I/O operations for LOBs.
|
Use MON_GET_TABLESPACE to check direct_read_requests and direct_write_requests.
|
Indicates heavy LOB usage or slow disk I/O.
|
Optimize LOB access patterns; evaluate disk subsystem perf.
|
|
Time spent waiting for prefetch operations to complete.
|
Calculate prefetch size and average time using MON_GET_TABLESPACE.
|
May indicate poor prefetch configuration or slow storage.
|
Tune prefetch settings (extent size, prefetchers); assess storage performance.
|
|
Time spent waiting for latches controlling access to shared resources.
|
Use MON_GET_EXTENDED_LATCH_WAIT to check latch_name and total_extended_latch_ wait_time.
|
Indicates contention on shared memory structures like buffer pool pages.
|
Identify and partition hot tables/indexes; reduce concurrency on shared resources.
|
Table 3: Wait time categories
Basic Best Practices for Wait Time Analysis
-
Top-Down: Begin with database-level metrics, then narrow your focus to the most impacted area (which might be bufferpools, tablespaces, individual SQL statements or so forth).
-
Use the Right Tools: Choose the tool that best fits the depth and scope of your analysis: db2mon, monreport.dbsummary() or MON_GET* table functions
-
Prioritize Common Waits: Focus on lock waits, I/O waits, and latch contention before exploring less common issues.
-
Always use Delta Metrics: Avoid relying on cumulative metrics. Use temporary tables and time-based snapshots to calculate deltas for accurate insights.
Db2 LUW performance tuning is not just about identifying slow queries—it is about understanding where time is being spent and why. By focusing on wait times and leveraging modern monitoring tools like db2mon and MON_GET_* table functions, you can follow a data-driven diagnostic strategy.
The concepts introduced in this post are designed to help you start to pinpoint performance root cause. Distinguishing between application and database server delays allows effort to be spent in the right area.
Ultimately, performance tuning is an iterative process and may have many stages of change, testing and validation to reach the chosen goal. With the right tools, methodology, and mindset, you can turn performance measurements into actionable insights that drive real improvements in system responsiveness and user satisfaction.
C Ragahvendra is a Senior Staff Software Engineer and DB2 LUW Product Engineer in IBM ISL Labs Bangalore. Raghavendra has over 13 years of experience in IBM and brings tons of intelligent ideas/suggestions/best practices for solving complex Db2 Performance issues. He is also an automation expert in Korn shell and ansible playbooks for various automation activities. For his tremendous efforts, he also received a prestigious “Hall of fame award” ,"IBM Rockstar Award" and "Outstanding Technical Achievement Award" for various automation activities for Michelin Client. Raghavendra is instrumental in identifying design issues in the db2 list utilities and the db2 list utility details command where Reorg is not displayed and raised a Db2 aha idea for it. Raghavendra is always committed and working towards the improvements of Db2 that could help all our Enterprise clients. He can be reached at C.Raghavendra@ibm.com
Toby Haynes: With a background in Observational Astronomy at the University of Cambridge, England, Toby joined IBM Canada in 1999 as a software developer in the Db2 Runtime engine. After eight years of development experience, he moved over to the Db2 Performance team to start work on what would become Db2 pureScale v9.8, working under Steve Rees. In the late summer of 2018, he stepped up to the role of Technical Manager for Db2 pureScale, looking after an active team of developers and support analysts. Now a Senior Software Developer at IBM, Toby is examining many areas of Db2 for robustness, usability and performance excellence.