Discover the runbook of IBM Z OMEGAMON AI for CICS Transactional Response Time Use Case where Predictive AI and machine learning help detect and alert of significant divergence revealing potential resource constraints ensuring performance and protection of the transactional workload.
OMEGAMON AI Insights GA Version 2.2.0 - November 14, 2025
(10 min read)
Find all episodes of the podcast here: IBM Z® OMEGAMON® product demos!
Purpose & Scope
Purpose: Rapidly determine root cause when the average transaction response time in a CICS region has significantly increased over the last hour compared to historical baseline.
Scope:
- SMF 110 Performance Class (transaction-level timing: CPU, dispatch, suspend).
- CICS Statistics (SMF 110 subtype 2) for region-level health.
Tools: Your analytics platform, OMEGAMON AI/Web UI, SMF Records, CICS PA.
When to Use
You received a notification or noticed that CICS regions Response Time has significantly diverged and is impacting performance of transactional workload.
Expected Outcomes
As CICS is supporting critical business, identify quickly whether the cause is volume‑driven, outlier‑driven (a small set of transactions skewing the average), or systemic. Assess the blast radius and who to dispatch to for further analysis.
- Site Reliability Engineer - Cross system overview, trend charts, first‑pass analysis.
- Application Team - Volume driven, workload shift or unplanned spike, MaxTask pressure.
- CICS Systems Programmer - Broad resource contention, suspend/dispatch delays, subsystem latency.
- Subsystem Team - Db2/MQ RMI delays, file string shortages, abends surge.
Context: As a Site Reliability Engineer (SRE), you received one notification of anomalies detected on several CICS regions simultaneously for an abnormal Transactional Response Time Divergence.
Step 1 - Scope Definition - Define “who diverged” and “from what” (by regions)
We first need to pinpoint the scope of the divergence (by region) and define the baseline we’re diverging from.
-
Check average response time trend - Compare the last 2 hours against the historical same day-of-week and same hour baseline
-
Identify high‑volume vs high‑impact drivers
- If the number of transactions increased significantly, it is volume‑driven.
- If volume is stable but response time increases, it is likely CPU, dispatch, I/O or RMI waits.
-
Assess skew contributors - A handful of failing or long‑running transactions can inflate the average, check distribution, mix or dictionary would require a CICS SME.
Dispatch:
- Volume driven - Application Team to validate workload shifts or unplanned spikes.
- Volume remains stable or more details: CICS System Programmer, as degradation is likely internal (dispatch, suspend, resource contention).
Reference: IBM Z OMEGAMON AI for CICS - Response Time Analysis
Example:
- Confirmed: Response time anomalies on several regions at the same time - volume driven - 1 notification received for more than 15 anomalies - confirm impact and involve SWAT team
- Baseline: Different baseline for each region and very high divergence compare to previous week
Extraordinary Step - Size the blast radius
The Blast Radius refers to the extent of a performance "contagion". We first need to make sure how much of performance impact this event is and eventually dispatch to higher severity and SWAT team.
Dispatch:
- SWAT team or CICS System Programmer depending on impact
Reference: ---
Example:
- Confirmed: Impact on 2/3 LPARs and more than 15 regions - Involve SWAT team and increase to a critical severity event for immediate action
Step 2 - Gathering clues
There is not much the SRE can do alone than looking for more symptoms at this point.
When 15 regions across 2 out of 3 LPARs are affected, you aren't looking at a coding bug in one program but at a shared resource or infrastructure bottleneck that is common to those 15 regions but absent or healthy on the 3rd LPAR.
Thousands of abends following the response time spike on the 2 LPARs only.
Most time spent waiting on first dispatch before Db2 RMI or FC Read starts kicking in.
Several regions hitting the max tasks.
What Next?
The investigation for a Site Reliability Engineer would stop here where more Subject Matter Expert per Subsystems and Application focus would take over with OMEGAMON Web UI dashboards looking at more detailed information to ensure root cause isolation and corrective measures quickly:
- Capacity Planning / Network Team (to find the source of the traffic surge)
- CICS SME (to suppress non-critical dumps and stabilize the regions, investigate with OMEGAMON AI for CICS looking at standard deviation of CPU and response for example)
- Application Dev / DevOps (to stop the retry loop from the source as thousands of abends for an hour suggests)
---
Based on a real and authorized customer dataset, the AI models would have detected real performance anomalies before they become outages.
The AI models continuously learn each region’s normal behavior and surface only true deviations, eliminating the fatigue and noise created by static thresholds.
This gives SREs earlier warning, clearer context, and drastically faster triage, which directly reduces Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR).
For customers, the return is simple: less downtime, faster recovery, fewer false alerts, and better use of expert time, at enterprise scale.
OMEGAMON AI does not replace SMEs, it amplifies their impact by providing them clean, high‑quality signals instead of raw data streams. It ensures they spend time solving real problems, not finding them.
We want to hear from you!
Have you faced hidden performance issues? Curious how AI could help?
👉 Share your story on IBM Idea portal or request a demo today.
📖 Read how OMEGAMON AI gives the possibility to solve problems before these impact the end user experience
🛠️ Explore the product: IBM Z OMEGAMON AI Insights official documentation and release note
🎥 See other Product Runbooks for Db2, CICS or z/OS.
#monitoring, #ArtificialIntelligence(AI), #IBMZ, #OMEGAMON, #CICS, #AnomalyDetection, #IBMAI#OMEGAMONAIInsights
@Mick Harris, @John Hancy, @Aleksandr Charcikov, @Ezriel Gross, @Ash Mahay, @Jim Porell, @Anna Murray, @Fabien Gautreault