AIOps: Monitoring and Observability - Group home

Does my zOS LPAR have a CPU issue?

  
Recently I noticed slow response from my started tasks on a z15 in my RSB2 LPAR.   So using IBM z OMEGAMON Monitor on z/OS lets go looking to see if there is a CPU issue for the RSB2 LPAR.

First in the Enhanced 3270ui interface lets look at the last few hours CPU usage for the PLEX0K Sysplex which RSB2 is part of on this z15.

Near Term History CPU Usage

We can see the whole z15 General Processor set is very busy with usage in the 97%-99% range in each of the 5 minute summary periods.   The zIIP pool is also very busy running in the 76% - 83% range too.   Lets now take this down to our Sysplex view to see how it is handling the work in one of the heavy 5 minute time frames.

Plex0K workload view

We can quickly see in the RED for the CICSxxxx address spaces on RSB2 they are delayed 100% of the 5 minute time frame needing CPU.  We also can see the service class for these is STCLO,  which is below other higher priority work.  WLM is doing the job expected and giving CPU to higher priority address spaces as RSB2 seems to be CPU constrained. This is a great quick view of the whole Sysplex workload using RMF Monitor III data fed into OMEGAMON via the GPMSERVE API.

If we now sort this view on Velocity we can see which address spaces in the Sysplex are getting CPU and working.

Plex0K Sysplex Velocity

Here we see the SYSTEM and SYSSTC address spaces in the Sysplex are getting any CPU available.  These address spaces need the CPU to keep z/OS handling the highest priority work.   We could add our CICSxxxx address spaces to a higher priority service class and maybe that would help.  But overall the RSB2 and Plex0K Sysplex look like it needs more CPU access to handle this workload.

Lets now look at the Tivoli Enterprise Portal (TEP) User Interface for IBM z OMEGAMON Monitor on z/OS to get a better view of the whole z15 processor use and possible delays to LPARs.

TEP LPAR View

If we focus on the bottom CPC LPARs Status in the second row for our RSB2 LPAR what do we see?   Look at CPU %Ready of 63.3%.  This metric tells us 63.3% of the time the LPARs needs CPU and can not get a processor to run its work.

Therefore we can conclude Sysplex Plex0K and LPAR RSB2 need more CPU to run its workload.   We can look at how to apply more processors to this LPAR Cluster and LPAR to help the workload.   But we have to remember the whole z15 is running at about 97-99% and we will have to take processors away from some other workload in order to improve this Sysplex and LPAR workloads.

We can see how IBM z OMEGAMON Monitor on z/OS is very helpful in showing us the need for more processing power to help get the work done in LPAR RSB2.

Joe Winterton
Rocket Software