AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

Members of this community will discuss end to end near-time collection, curation and reporting for simplified performance, cost and capacity management

 View Only

Potential Infrastructure CPU Reduction Opportunities in an Enterprise Consumption Environment

By Camila Vasquez posted Tue April 29, 2025 06:39 AM

  

Written by Todd Havekost on February 16, 2021.

On z processors since the z13, machine cycles spent waiting for data and instructions to be staged into Level 1 processor cache represent a significant portion of overall CPU consumption, typically 30-50% of the total. Tuning actions that reduce those waiting cycles can achieve significant CPU savings. 

Due to the 1:1 relationship between Vertical High (VH) logical CPs and physical CPs, work that executes on VHs typically spends fewer cycles waiting on processor cache than work executing on Vertical Medium or Vertical Low CPs. Figure 1 shows the reduction in waiting cycles (“Finite CPI”) for work running on VHs on one system. 

Figure 1 - Finite CPI by logical CP (© IntelliMagic Vision)

In this chart you can see that the VH CPs (the lowest set of lines) averaged 1.5 cycles per instruction, while the VM and VL CPs (the middle and top set of lines) averaged 2.41 cycles per instruction - that is a 61% increase in CPU. Workloads with ‘High’ or ‘Average’ LSPR workload characterizations (as derived from the Level 1 Miss Percentage and Relative Nest Intensity metrics) tend to provide more potential for savings from processor cache efficiency. 

As shown in the Engine Dispatch Analysis in Figure 2, setting LPAR weights so that the guaranteed LPAR share (the green line) is sufficient to support actual CPU consumption (red line) is an important step toward maximizing work executing on VHs. 

Figure 2 - Engine Dispatch Analysis (© IntelliMagic Vision)

You can find more information about this topic in IBM Technote Number of Logical CPs Defined for an LPAR, by Kathy Walsh and SHARE in Fort Worth 2020 Session 27021, Understanding LPAR Controls for Better Performance, also by Kathy.

Running CPCs at lower utilizations or deploying sub-capacity models where feasible are other potential ways to increase processor cache efficiency. See ‘Customer Sub-Capacity CPC Experience’ in Tuning Letter 2018 No. 3 for a case study reflecting the magnitude of potential savings, and ‘CPU MF Part 3 – Optimizing CPC Cache’ in Tuning Letter 2017 No. 4 for additional detail on these and other potential actions.

z/OS Operating System Efficiencies

The operating system also benefits from running at lower utilizations. The impacts are largely related to queueing theory. As the system gets busier, there are more units of work to be managed (meaning longer control block chains to be scanned and managed), more serialization (which means more opportunity for contention), and an increased likelihood of shared resources being in use by some other tasks (which means more queueing, and yet longer control block chains).

Between the additional stress on the CPC caches and the additional work for the operating system and subsystems (CICS, IMS, and Db2), IBM presentations estimate that online workloads experience a 3-5% increase in CPU per transaction for each 10% growth in physical CPU utilization (see Figure 3). You can find more information in IBM White Paper 
Running IBM System z at High Utilization, by Gary King.

Figure 3 - Growth in CPU Time per Transaction as CPU Utilization Increases (IBM, Hutton)

Traditional mainframe capacity planning has been predicated on the assumption that running mainframes at high utilizations is the most cost-effective way to operate. But now that software typically represents a much larger expense than hardware, the potential CPU savings from both operating system and processor efficiencies when running at lower utilizations suggest that EC customers should re-examine that approach.

System Capture Ratio

The capture ratio is a key indicator of system health for a z/OS system. Poor capture ratio values may indicate that the system is spending too much time performing non-productive work, time that if recovered by tuning actions, translates directly into cost savings. This is especially true with TFP, where every CPU cycle counts.

In z/OS, the capture ratio is the ratio between the amount of used processor cycles the system was able to account to end users, and the total amount of used processor cycles, also including those spent for general system management activities. From this definition it should be clear that the capture ratio value is always less than one since there are several housekeeping and resource management activities that the operating system can't really charge to any specific end user. For this reason, the capture ratio is an important system health indicator because too much internal processing can be a sign of less than optimal efficiency. 

Technically, the capture ratio is calculated by dividing the CPU utilization figures reported by SMF record 72.3 (workload activity) by the CPU utilization figures reported by SMF record 70.1 (CPU activity). In a typical mainframe configuration with General Purpose processors and zIIP specialty engines, it is possible, actually recommended, to calculate individual capture ratio values for the different processor pools, since they do very different things. For zIIPs, if you use Symmetric Multi-Threading, the calculation must take into account SMT multi-threading productivity. WSC white paper z/OS Capture Ratio Calculations explains how to measure capture ratio in the different configurations.

Capture ratio is important for GPs because of its impact on software cost; you want to make sure your system is not wasting CPU cycles performing housekeeping activities which may be avoided. This is less of a concern for zIIPs because their eventual inefficiency doesn't increase your software cost.

Before specialty engines, all processing activity was performed by General Purpose processors. When IBM introduced specialty engines, they were designed to only execute specific parts of the users' workloads. This means that while zIIP processors also need to be managed (for example there is an instance of the dispatcher running on each processor including zIIPs), most global system management activities still execute on the GPs. 

As an example, while zIIPs can initiate I/O operations, I/O interrupt management is only done by General Purpose processors. This means that the capture ratio measured for GPs is inherently lower than that measured for zIIPs because the zIIPs perform much less global management activities. 

To give you a guideline, we expect to see GP capture ratios of above 90% for production systems. For zIIPs, it is common to see capture ratios in the 95% range. Of course your mileage may vary: Systems with homogeneous workloads usually show better capture ratio values. That's one reason why development systems, which usually host more heterogeneous workloads with development activities mixed with online and batch testing, normally experience lower capture ratios. This is very clearly illustrated in Figure 4 on page 9. In this case, SYSA and SYSD are two development systems, as you can see from the more irregular pattern of the lines, whereas SYSB and SYSC (the blue and yellow lines) are both closer to the 90% mark and also both much flatter.

Figure 4 - Sample Capture Ratios (© IntelliMagic Vision)

Finally, capture ratios typically decrease at lower utilizations because the relative contribution of uncaptured, time-based, housekeeping activities, increases. For this reason when analyzing the capture ratio you should only take into account periods of sustained work. 

You should routinely measure and track your systems' capture ratio, and have a process in place to detect changes. If you think you are experiencing a poor capture ratio, your first step is to try to understand why. This can be due to multiple reasons, and identifying the one[s] affecting you is not easy because, by definition, the system doesn't track what uncaptured CPU time is used for.

Information about these reasons is scattered in literature, and so far the best source we could find to explain uncaptured time is a Tuning Letter article written for us by Greg Dyck from IBM development 23 years ago - Uncaptured Time, in Tuning Letter 1998, No. 3. By the way, this is a testament to the value of the goldmine of technical information Cheryl accumulated over the time, most of which are still very relevant after so many years.

When trying to understand why your capture ratio is poor, you should compare your systems' behavior with that of similar, well-behaving, systems. Working with customers we created a checklist of possible causes with associated metrics to be measured. The aim of the process is to help identify one of more deviating metrics which may explain the poor capture ratio, and determine if the causes can be eliminated. 

  • Sometimes the reason lies in how your application programs are designed, which makes it difficult to find a solution.
  • Sometimes it lies in how the operating system is designed.
  • Sometimes all you can achieve by tuning is to translate uncaptured time into captured time which can be attributed back to some piece of work. This won’t make any difference to your software bill, however it is still a valuable exercise because it positions you to better understand who is using your system’s capacity.

In a future article of this series we hope to dig into the technical details of how to identify the reasons of poor capture ratio.

LLA / LNKLIST / VLF Tuning 

Many sites view the Library LookAside (LLA) started task as part-and-parcel of the system LNKLST function. Based on the installation instructions for your various products, you define their products to the system LNKLST and LLA does its ‘magic’. And, voila, everything just ‘works’!

However, we all know the old saying that “just because something isn’t broken, that doesn’t mean that it can’t be fixed”. And that is the case for LLA and its cousin, Virtual Lookaside Facility (VLF). While it is true that LLA caches the directory for LNKLST data sets, and selects some of the LNKLST load modules to be cached in VLF, it can do a lot more than that.

One fact that many people are unaware of is that LLA can cache the directory of any PDS. For example, the directories of your ISPF libraries can be added to LLA.

This function is best suited for data sets that are rarely updated, and it does take a little time and effort to identify good candidates and then define them to LLA. However, doing so could reduce the number of I/Os to those data sets by up to 50%.

Adding a PDS to LLA can potentially reduce I/Os to that data set by up to 50%.

Another capability that many are not aware of is that VLF is not limited to caching load modules that exist in LNKLST data sets. Any load library that is defined to LLA can potentially have its load modules cached in VLF. 

Another oft-forgotten capability of VLF is that it can cache TSO Clists and Rexx execs. The data sets containing the Rexx execs or Clists must be explicitly defined using EDSN statements in the IKJEXEC class in the COFVLFxx member – an example is shown in Figure 5 on page 11. 

Figure 5 - Sample COFVLFxx definitions

I would be willing to bet a pint of Guinness that your COFVLFxx member hasn’t been updated in years, so this might be another ‘low hanging fruit’ that you can easily address. Note that there are some restrictions – for example, the Clists or Execs must be loaded from SYSPROC or an application-level library (defined using the TSO/E ALTLIB command). See the section titled ‘Using SYSPROC and SYSEXEC for REXX execs’ in IBM manual z/OS TSO/E REXX Reference for more information about including Rexx execs in SYSPROC libraries.

Few of these changes are likely to save you millions of I/Os per day. But remember that every I/O takes some amount of CPU time to drive it, so the more I/Os you can eliminate, the more CPU time you can save. 

Security Manager Tuning

Security is a HOT TOPIC. Hardly a day passes that we don’t hear about some company or government department being hacked. Sad to say, many security people now say that the biggest risk comes from ‘Insider threats’ – in other words, your own staff. These people have an immediate advantage in that they probably already have a userid on your systems. As a result, we are seeing much more interest in using security profiles that have existed for many years, but were ignored by most businesses.

What does this have to do with saving CPU time? If you just accept the default settings in your security manager, most of those security calls are likely to result in I/Os to the security database. And I/Os mean – that’s right, more CPU time. 

The topic of performance of your security product is a no-man’s land in many installations. The security staff’s primary responsibility is to lock everything down as securely as possibly (which, of course, means more I/Os to the security database). They frequently do not have a technical background, and the documentation they use typically doesn’t say anything about the performance implications of using a particular security capability. And the system programmers and performance analysts don’t really want anything to do with the security system. Most of the options that can improve performance are controlled from inside the security product, so the techies don’t have access to make the changes. And it is in the nature of security administrators to be suspicious of any ‘outsider’ that comes along and starts recommending changes to their security environment.

So, is it worth the hassle? We ran a performance health check for a client last year - the second busiest data set in their entire sysplex was the security database. And we are increasingly seeing that one of the highest volume SMF records are the type 80 (‘Security Product Processing’) records. That should give you some indication of the volume of security calls that are taking place on your system. If you can work up the courage to approach your security colleagues, we recommend first reviewing our RACF Tuning article in Tuning Letter 2019 No. 4. As a starting point, to get some insight in how much CPU time is being consumed by your security product, we recommend monitoring the volume of SMF Type 80 records being produced on your system, and the I/O rate to the disk volume that contains your security databases.

If your security colleagues eye you suspiciously when you approach them with your tuning ideas, explain to them that if you can improve the efficiency of the security manager, then they can start locking down even more resources.

zFS Tuning 

An aspect of system performance that tends to fly under the radar is the z/OS File System (zFS). zFS probably only has a fraction of the number of users that your Db2 systems have, so it doesn’t have a large user volume to draw attention to it. And many experienced performance analysts know all about VSAM tuning, and sequential data sets and partitioned data sets. But when you ask them about zFS? You get “er, well, that’s that weird Unix stuff, and probably no one is using it much, and it doesn’t get the coverage in my performance monitors that the ‘normal’ data sets do, and oh, wow, look at the time, I’m already late for a meeting”.

All of these sentiments are true to some extent, but times are changing. Name me a major IBM software product that doesn’t deliver a zFS data set? Even IMS has its own zFS data sets! Whether you like it or not, the world is becoming more ‘Unixy’, and that applies to z/OS as well.

One of the challenges of performing a tuning exercise on zFS is the lack of documentation. And to be fair, it is complex. A typical zFS-using application has multiple layers, with each layer potentially doing its own caching. 

If we look at z/OSMF as an example. You have z/OSMF Java programs executing under Liberty. Liberty is running under OMVS. And ZFS is sitting under OMVS. When we were working with the IBM team in China to improve the startup performance of z/OSMF, one of the first challenges we had was in trying to reconcile the number of EXCPs that the z/OSMF IZUSVR1 started task was reporting, with the (MUCH smaller) number of I/Os we were seeing on the volumes containing the z/OSMF zFS files.

We are still some distance from being able to provide useful information about how to interpret and benefit from the metrics that are already available. We hope to address that in a Tuning Letter later this year. But in the interim, if this is a topic that you think is, or will be, important in your installation, there are a few actions that you can take now:

  • Make sure that you are collecting and keeping the SMF Type 92 (‘File System Activity’) records. Note that, due to the potential high volume, IBM recommends that you do not enable the type 92 subtype 59 records unless you are investigating a specific problem.
  • Get familiar with the RMF (or your equivalent product) HFS and ZFS reports. At a minimum, use the reports to get an insight into which are your busier file systems. The ZFS information is stored in RMF type 74 subtype 6 records.
  • Try out the F ZFS,QUERY,ALL command if you run ZFS in its own address space, or F OMVS,PFS=ZFS,QUERY,ALL if you run ZFS in the OMVS address space. You will likely get hundreds of lines of very interesting looking information in return. What is missing is good guidance on interpreting that information and guidance on how to 
    address ‘bad’ numbers. That is the void that we hope to address later this year.
  • If you are still running ZFS in its own address space, consider changing to run the ZFS function in the OMVS address space.
  • This is only possible in z/OS 2.2 and later. We made that change in our little zPDT system and it took minutes of CPU time off the z/OSMF startup. You can easily check to see if you are running ZFS in its own address space by issuing a D A,ZFS command. If the response says “ZFS NOT FOUND”, then it is running in the OMVS address space. If you get a response providing information about the ZFS address space, then it is running in its own address space, and you can find more information about the options here.

Running zFS in the OMVS address space can reduce CPU consumption for any intensive user of zFS files.

Figure 6 on page 14 Illustrates the potential impact of zFS tuning. The spike in XCFAS CPU in was driven by SYSBPX XCF traffic arising from mounting a zFS file system on a system in the sysplex other than the one responsible for most of the I/O activity to that file system. In this example, 15,000 MSUs were consumed over a six-week period by this additional XCF message traffic. 

Figure 6 - Weekly Enterprise Consumption MSUs (© IntelliMagic Vision)

We believe that this is a topic that will become more important in the future, so we want to arm our readers with the information they will need to ensure that your systems are set up optimally before this becomes a larger part of your workloads. If you have any experiences, or questions, or hints and tips that you would like to share, please let us know and we can integrate those into our article.

Opportunities with Benefits for Many Address Spaces

There are many other types of infrastructure CPU tuning opportunities that can potentially achieve benefits for many address spaces.

Reduce zIIP Overflow

Work that executes on zIIP engines does not incur software expense (except in selected System Boost scenarios). But to safeguard responsiveness, z/OS makes provisions to allow zIIP-eligible work to be dispatched on General Purpose CPs (GCPs) if the zIIP processors consider themselves to be in need of help - this is referred to as ‘zIIP overflow’. Db2 effectively requires this overflow capability to be turned ON at the system level through specifying IIPHONORPRIORITY=YES in the IEAOPTxx member.

Under R4HA-based licensing models, sites typically gave minimal attention to overflow of zIIP-eligible work unless it occurred during peak intervals. But since in EC environments all overflow work is chargeable, analysts will want to have good visibility and effective management to minimize this largely avoidable expense.

Figure 7 presents a high-level view of zIIP overflow, broken out by WLM workload. This chart can be used to illustrate the magnitude of the potential saving opportunity. The typical total weekly overflow of 5000 MSUs here may reflect a very small percentage of overall consumption – you would need to know your $ per MSU price to make a judgment about whether this is a concern. But at a minimum the spike in the week of 11/8 might be considered worthy of investigation. You know what they say - “A million here, a million there, and pretty soon you are talking about real money.” 

Figure 7 - zIIP Eligible Work Executing on GCPs (IntelliMagic Vision)

Drilldowns in this scenario identified the overflow CPU as resulting from a spike in a DDF service class that occurred in a 24-hour period over the weekend. It was driven by what appeared to be a runaway DDF task that consumed a total of 19,000 consumption MSUs. Since DDF is often a heavy zIIP consumer, it also tends to be a primary contributor to zIIP overflow.

There are several considerations to keep in mind relative to managing zIIP overflow in general, and DDF in particular.

  • If significant overflow is coming from high priority service classes (e.g., Db2 system tasks), consider adding zIIP hardware capacity. This can be financially attractive since the acquisition cost for zIIP capacity is lower than GCPs. Also, on sub-capacity processors, zIIPs execute at full speed and thus may provide faster performance than if the work overflowed to a slower general purpose CP.
  • If the overflow is being driven by work that is not highly response time sensitive, and thus could afford to wait what in most environments is likely to be only a few microseconds, the capability to manage zIIP overflow at the service class level with the WLM policy Honor Priority=NO option is likely to be an attractive approach. This prevents that work from generating software expense.
  • Finally, Figure 7 on page 15 highlights the importance of having effective controls to minimize the costs generated by runaway work of various types, whether manifested through zIIP overflow or general-purpose CPU consumption. For DDF work, Db2 resource limits or other controls or tooling may be utilized. You may also want to revisit 
    limits previously set on other controls on user-generated adhoc work (e.g., CPU time limits before looping jobs abend with S322 abends). (See ‘Proactively Manage “Rogue” Workloads’ on page 20 for more ideas on this subject.)

It is likely that a production system will always have some level of zIIP overflow, depending on how spiky your zIIP-enabled workloads are. Unfortunately, there is no magic number that we can provide to say what is ‘acceptable’. One of the advantages of TFP is that you know precisely how much one MSU costs. Using that information, you can determine the monthly cost of work that is overflowing from your zIIPs and then make a business decision about whether the cost is high enough to justify doing more to reduce it further.

Reduce Coupling Facility Synchronous Service Times

Today’s parallel sysplex data-sharing environments often generate tens, if not hundreds, of thousands of requests per second accessing Coupling Facility (CF) structures. Many of those are synchronous requests that complete in a matter of microseconds. They are classified as synchronous requests because the user task is not interrupted but remains in 
control of the CPU while the CF operation is completed. The longer the response time from the CF, the more CPU time is consumed by these synchronous CF requests. Anything that you can do to eliminate unnecessary requests (by reducing lock contention, for example), or reducing the service time will be reflected in reduced z/OS CPU consumption.

Enhancements to the CF infrastructure may provide an opportunity to improve service times. Since z processor engine speeds have been largely flat for several generations (since the zEC12), upgrading CFs to newer CPC models is not likely to be of much assistance here. In terms of engine configurations, ensure that production CF engines are dedicated (not shared) and that a sufficient number of engines is defined to minimize any queueing. For non-production sysplexes that don’t generate enough of a load to justify dedicated engines, ensure that Coupling Thin Interrupts are enabled on the CF, and in z/OS.

Another potential saving opportunity relates to link technology and distance. Using the latest available CF link technology optimizes service times. Even though the difference between one link technology and another might only be a few microseconds per request, all those microseconds add up especially if you are doing hundreds of thousands of requests per second. CS5 links (also referred to as ICA-SR) deliver significantly better performance than the CL5 links which are designed for long distance connections, so you should always use CS5 links if the cable length distance to the CF is less than 150 meters. And though such opportunities do not frequently come along, take advantage of any changes in data center configuration to reduce distances between CPCs (for ICFs) and external CFs.

Finally, sites who are duplexing Db2 CF lock structures can achieve CPU reductions (along with significant performance gains) by implementing asynchronous duplexing. This capability was delivered in z/OS 2.2, Db2 V12, and z13 CFs, and customer feedback has been very positive. See ‘Asynchronous CF Duplexing’ in Tuning Letter 2018 No. 4 for details on the Db2 IRLM CPU savings realized by one customer. If you are using external CFs today to avoid the cost of duplexing your Db2 lock structure, Asynchronous CF Duplexing might allow you to place your CFs in internal CFs instead. This would not only be less expensive from an acquisition and maintenance perspective, it would also allow you to use Internal Coupling links (ICPs), which are the fastest of all coupling options.

Leverage Large Memory (1MB and 2GB) Pages 

z/OS support for 64-bit addressing in both physical and virtual memory has led to dramatic increases in application memory sizes. Yet Transaction Lookaside Buffer (TLB) hardware sizes have remained relatively small due to low access time requirements and chip real estate space limitations. As a result, the capacity of the TLB to maintain reference information for 4K pages represents a much smaller fraction of application working set sizes, leading to a larger number of TLB misses and associated performance penalties and CPU inefficiencies.

z/OS version 1.9 introduced support for large 1MB and 2GB pages, defined through the LFAREA parameter in the IEASYSxx parmlib member. These large pages enable a single TLB entry to support a much greater range of virtual storage address translations and thus improve performance and efficiency. And each new version of z/OS expands operating system exploitation of large 1MB pages.

z/OS 2.3 delivered a valuable enhancement for the use of large pages by changing the allocation of fixed 1MB pages. Prior to z/OS 2.3, the system would reserve whatever number of 1MB pages you specified on the LFAREA parameter, regardless of whether any product would exploit them or not. As a result, sites were hesitant to specify large values - you don’t want to have a bunch of expensive memory sitting there doing nothing. But starting with z/OS 2.3, the LFAREA value is treated as an upper limit. So, if you specify 1000 1MB pages on the LFAREA parameter and no products used fixed 1MB pages, then zero pages will be allocated. If products request 500 pages, then 500 pages will be allocated. This should allow sites to be more aggressive about specifying larger LFAREA values.

LFAREA in z/OS 2.3 becomes an upper limit and doesn't reserve pages until needed.

Db2 is a major exploiter of large pages, which can be put to good use supporting large buffer pools. IBM has published studies showing CPU efficiencies of 5% or more achieved by using large pages in conjunction with page-fixing Db2 buffer pools and increasing buffer pool sizes. Readers are encouraged to leverage the Db2 Buffer Pool Simulation tool that is part of Db2 V12 to determine how much value you would get from larger Db2 buffer pools.

Leverage zEDC Compression

Prior to the z15 CPCs, zEDC required both chargeable zEDC cards and a chargeable z/OS feature. However, starting with the z15s, the zEDC function is now built into each chip, and is included at no additional charge. Prior to this, all clients that we spoke to that had zEDC were delighted with it – and that was back when they had to pay for the cards. We expect that they will be even happier when they move to z15, because the hardware capability is free, and IBM claim that it supports up to 11x the throughput of the old zEDC cards. 

From a software cost perspective, the more mainstream users of zEDC (QSAM, BSAM, SMF, DFSMSdss, and so on) still require the chargeable z/OS feature on the z15. However there is an increased list of products that can use zEDC without requiring that feature, including products like MQ and Connect Direct.

If you are using ‘traditional’ compression of sequential data sets, or in DFSMSdss dumps, or in DFSMShsm, switching to zEDC is practically guaranteed to reduce your CPU consumption. If you are like many sites and your peak R4HA is during the online day, saving MIPS during your batch window might not have been deemed important in the past. However, when you move to TFP, the potential savings are significant. When we perform address space-level analysis for our clients, we regularly find DFSMShsm in the top 10 CPU consumers.

If you do not perform any compression today (and in our experience that would be quite unusual), turning on zEDC might or might not save you CPU time – it very much depends on how well your data compresses. The better compression ratios you get, the more CPU savings you should expect. However, if you are not doing any compression today, enabling zEDC should result in an immediate savings in disk space, a reduction in channel and disk subsystem port utilization, and better memory exploitation (because the buffers will hold more data).

Another consideration related to zEDC is encryption. If your company is considering implementing data set encryption, remember that the CPU cost of encryption is proportional to the amount of data to be encrypted. If you compress the data prior to encrypting it, the total cost, including the cost of compression with zEDC, will be less than if you encrypted the uncompressed data. And, of course, you still get all the benefits of compression as described above. 

IBM’s zBNA tool can help you identify candidates for zEDC compression, and provides you with a rough estimate of the cost or savings from implementing zEDC. You can find more information about zEDC in our zEDC article in Tuning Letter 2014 No. 2.

Leverage PDSE Opportunities 

PDSE (Partitioned Data Set Extended) has been with us for many years. However, due to some reliability issues in their early days, many sites steered clear of them. However, those days are long behind us, and multiple IBM products are now shipping their partitioned data sets in PDSE format. Perhaps more important than that is the fact that the current versions of the COBOL and PL/I compilers create program objects, which must reside in PDSEs. So, like it or not, you are going to have more exposure to PDSEs.

But you should view this as an opportunity to enhance the performance of your system. With traditional PDS data sets, LLA and VLF can only cache load modules. You can use LLA to cache the directory of any PDS, however it is only load libraries that can benefit from LLA’s use of VLF to cache the members of the partitioned data set.

On the other hand, both the directory and the members in a PDSE can potentially be cached using the SMSPDSE and SMSPDSE1 address space buffers. Note that member caching is turned off by default. So it is very possible that you have existing PDSE data sets that could benefit from member caching, but are missing out on that capability. If you are not sure if it is worth the effort, create a report like that shown in Figure 8 on page 20 that shows the top 20 partitioned data sets on your system. If those data sets are not being cached today, adding them to LLA or converting them to PDSEs would result in the directory blocks being cached, which could reduce the I/O rate to those data sets by up to 50%. That sounds like a pretty good use of a few days of your time.

PDSE member caching is turned off by default. Do you really want that?

Figure 8 - Top 20 Partitioned Data Sets by I/O Rate (© IntelliMagic Vision)

Proactively Manage “Rogue” Workloads

One of the great strengths of the mainframe platform is its ability to effectively support countless diverse workloads of all types and sizes, arriving from all conceivable sources. But with that “blessing” comes the “curse” of trying to manage an environment with such extensive and unpredictable variability. 

Particularly in EC environments, where every MSU is chargeable, effective management necessitates visibility that provides timely notifications of significant changes in CPU consumption. This capability is essential to enable investigation and potential remediation of what may ultimately be “rogue” workloads that are not providing business value commensurate with their consumption.

The following customer example may help illustrate both the challenge and a potential approach to meeting that challenge. Figure 9 on page 21 illustrates how IntelliMagic Vision detected significant changes in CPU consumption, initially at the system level. In this example, CPU consumption for SYS2 increased on a given day to a level more than two standard deviations higher than its consumption over the prior 30 days.

Figure 9 - WLM by System Change Detection (© IntelliMagic Vision)

Drilling into that consumption by Service Class (Figure 10) identified that the primary driver of the CPU increase was service class DDFTEST. That service class was more than five standard deviations above its 30-day average. 

Figure 10 - WLM Importance Rating Change Detection (© IntelliMagic Vision)

Further analysis to view the CPU consumption of this service class indicated an every-other-day spike exceeding 5000 consumption MSUs, as shown in Figure 11 on page 22. Drilling down one additional level to isolate the work by Db2 authorization ID led to a quick identification and elimination of the offending workload. 

As humans, we can look at that chart and it is immediately obvious that something is not normal. However, trying to build that intelligence into software is not easy, but it is required if you are to have any hope of successfully managing a large, complex, environment.

Figure 11 - Daily Enterprise Consumption MSUs (© IntelliMagic Vision)

References

This article is not intended to provide detailed implementation guidance. Rather, our objective is to raise awareness of system functions and capabilities that can improve the overall efficiency of your systems. If you would like more information about the topics covered in this article, the following documents might prove helpful:

  • ‘Introduction to CPU Measurement Facility’ in Tuning Letter 2016 No. 4.
  • ‘CPU MF Part 2 - Concepts’ in Tuning Letter 2017 No. 1.
  • ‘CPU MF Part 3 - Optimizing CPC Cache’ in Tuning Letter 2017 No. 4.
  • ‘Customer Sub-Capacity CPC Experience’ in Tuning Letter 2018 No. 3.
  • IBM Technote Number of Logical CPs Defined for an LPAR by Kathy Walsh.
  • IBM Point-of-View, z/OS Infrastructure Optimization using Large Memory, by Peter Sutton.
  • SHARE in Atlanta 2016 Session 19878, How to Leverage Large Memory on z, by Elpida Tzortzatos.
  • SHARE Virtual 2020 Session 25874, z/OS Large Memory Considerations, by David Betten.
  • SHARE in Forth Worth 2020 Session 26994, The Hitchhiker’s Guide to MIPS and Capacity Planning, by Dave Hutton.
  • ‘zIIP Capacity Planning’ article by Kathy Walsh, in Tuning Letter 2015 No. 2.
  • ‘Asynchronous CF Duplexing’ article in Tuning Letter 2018 No. 4.

Summary

Now that all CPU consumption is chargeable under EC, teams responsible for managing these environments will need to have powerful visibility and tooling to enable them to quickly identify sources of increased CPU usage and isolate the driving causes. 

This article introduced several potential approaches to reducing CPU consumption that can be implemented without requiring assistance or involvement from outside the infrastructure support groups. EC analysts will want to remain vigilant to identify newly identified opportunities for CPU efficiencies across the breadth of the z/OS platform. As these types of opportunities are discovered across the mainframe community, they often appear in places like Tuning Letter articles, and in conference presentations like Kathy Walsh’s “WSC Hot Topics” and others.

Stay tuned for the next article in this series which will explore potential CPU savings opportunities applicable to selected address spaces. Though initiatives in some of these areas may involve collaboration with application teams and thus take longer to implement, they can also be a source of significant CPU reductions and happier application owners. Topics under consideration for that next article include:

  • Tuning dataset buffers to reduce I/O and CPU (including LSR, BUFNI).
  • Optimizing dataset block sizes to reduce I/O and CPU.
  • Tuning DFHSM, which is commonly high on the list of started task CPU consumers.
  • Reducing job abends and associated wasted CPU.
  • Application profiling and tuning of high CPU jobs and transactions.
  • Recompiling programs with newer ARCH values.
  • Db2 tuning, including adding indices and tuning SQL statements.


If you have any suggestions or tips that you would like to share with your peers, please let us know and we will be delighted to pass that along. 

0 comments
26 views

Permalink