Discover the runbook of IBM Z OMEGAMON AI for Db2 CPU Time Use Case where Predictive AI and machine learning help detect and alert of significant divergence revealing potential resource constraints or unexpected workload shifts ensuring performance and protection of the transactional workload.
OMEGAMON AI Insights GA Version 2.2.0 - November 14, 2025
(10 min read)
Purpose & Scope
Purpose: Rapidly determine root cause when a Db2 data sharing group or member shows CPU divergence for CICS and/or DDF connections over ≥ 30–120 minutes versus same day/time historical baseline.
Data:
- SMF 101 (Db2 Accounting) with Classes 1/2/3; include DDF zIIP accounting.
- SMF 100 (Db2 Statistics) per member & connection type.
Tools: Your analytics platform, OMEGAMON AI/Web UI, SMF Records, Db2 Accounting & Statistics.
When to Use
You received a notification or noticed that Db2 CPU Time has significantly diverged for the workloads you monitor, is impacting performance and risks transactional workload disruption.
Expected Outcomes
Identify whether the cause is volume‑driven, contention‑driven (locks/latches), inside or outside Db2, member skew or group wide and who to dispatch to for further GBP/CF stress, zIIP offload loss, application commit/rollback churn... analysis.
- Site Reliability Engineer — Cross system overview, trend charts, first‑pass analysis (Steps 0–4).
- Db2 Systems Programmer — Routing imbalance, IRLM dispatch, BPs, etc. (Step 2, 4, Next).
- Application DBA / CICS or DDF App Team — Application activity surge, Commit discipline, SQL/package strategy, etc. (Step 2-4, Next).
- Capa/WLM/CF Specialist — Service Class/importance, zIIP entitlement/offload (Steps 2, Next).
Context: As a Site Reliability Engineer (SRE), you received a notification of an anomaly detected on a Data Sharing Group and Connection Type DDF for an abnormal CPU Divergence.
Step 1 — Scope Definition - Define “who diverged” and “from what” (by member & connection type)
We first need to pinpoint the scope of the divergence (by Db2 member and connection type) and define the baseline we’re diverging from.
At the subsystem/system level, SMF 100 Statistics consolidate CPU and wait time by connection type, which helps you see member‑to‑member skew and whether an increase is isolated to one member or group‑wide.
Visibility by member matters (to exclude a single hot member, misrouted workload, or GBP/CF locality effects). Comparing activity by member and connection type will help detect imbalances.
Dispatch: SRE validates SMF 101/100 data; if routing imbalance suspected, involve Db2 Systems Programmer.
Reference: IBM Db2 for z/OS - Statistic traces
Example:
- Confirmed: Transaction count doubled compared to previous Fridays and CPU surge correlated → volume surge primary driver.
- Confirmed: Stable → no SQL regression.
Step 4 — Suspension & Not Accounted Time - Investigate suspension and not accounted times drivers
We need to see if the extra waits are due to Coupling Facility (GBP), lock contention, or I/O bottlenecks.
Suspected Causes:
- IRLM lock contention (commit/rollback churn).
- Buffer pool latch contention (I/O increase).
Dispatch: Db2 Systems Programmer for IRLM priority and latch analysis.
Reference: Investigating Class 3 Suspension Time
- Confirmed: Lock/latch spike correlates with Not Accounted Time
What Next?
The investigation for a Site Reliability Engineer would stop here where more Subject Matter Expert for Db2 and Application focus would take over with OMEGAMON Web UI Db2 dashboards looking at Packages, Application metrics, Buffer Pools, CF stress...
Further Lock/Latch analysis would reveal a runaway SYSLH200 dynamic SQL package with heavy commit/rollback.
This dynamic SQL flood and possible deadlocks reflects that the volume of transaction itself was in this case unusual and anomalous.
After rolling out the previous version the Application DBA and Developer would actually find that a new JDBC-call with a 9-way join with a bad accesspath was the cause...
Highlighted by the AI models 3 days before!
Based on a real and authorized customer dataset, the AI models would have detected the divergence 3 days before the obvious spike, saving on 45k CPU seconds of overconsumption...
We want to hear from you!
Have you faced hidden performance issues? Curious how AI could help?
👉 Share your story on IBM Idea portal or request a demo today.
📖 Read how OMEGAMON AI gives the possibility to solve problems before these impact the end user experience
🛠️ Explore the product: IBM Z OMEGAMON AI Insights official documentation and release note
#monitoring, #ArtificialIntelligence(AI), #IBMZ, #OMEGAMON, #Db2, #AnomalyDetection
@Matthias Tschaffler, @Ash Mahay, @Jim Porell, @Anna Murray, @Fabien Gautreault