AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

Members of this community will discuss end to end near-time collection, curation and reporting for simplified performance, cost and capacity management

 View Only

The Expanding Role of Sub-capacity Processors in Today's Mainframe Configurations

By Camila Vasquez posted Mon April 07, 2025 08:15 AM

  

Written by Todd Havekost on February 21, 2024.

Sub-capacity mainframe models are processors that by design execute at less than full speed. IBM originally introduced sub-capacity processors in 2005 (with the z9 model) to provide customers more flexibility in sizing the capacity of the general-purpose processors (GCPs). This granularity was welcomed by smaller customers who were particularly sensitive to mainframe hardware and software expenses driven by incremental increases in capacity. 

IBM offers four speed ranges on all modem enterprise class mainframes’. 7-series models execute at full speed. There are three sub-capacity model series. The relative speeds of these series are determined by IBM2 and can change from one central processor complex (CPC) generation to the next. 

  • 6-series models: these execute at 66% of full speed for the z16 generation (56% on z15)
  • 5-series models: 43% of full speed on z16 (39% on z15)
  • 4-series models: 13% of full speed on z16 (same on z15) 

On the current z16, each additional physical GCP on a full speed 7-series model adds about 2000 MIPS for the first few GCPs. Historically, sub-capacity models were only of interest to smaller customers who due to financial considerations needed more granularity in the capacity they were acquiring.

Today's Expanded Role for Sub-capacity Models 

Several factors have combined to broaden the appeal of sub-capacity models far beyond the original intended audience of small sites. A primary driver has been the profound change to processor cache architecture introduced with the z13 CPC. The manufacturing-driven packaging change from multi-chip modules (MCMs) on the ZEC12 and its predecessors to single-chip modules (SCMs) on the z13 necessitated a Non-Uniform Memory Access (NUMA) topology for the drawer interconnects. This architectural change introduced significantly more variability into latency times for cache accesses, depending on the location of the data relative to the chip the instruction is executing on®. 

Since the z13, processor cache efficiency has played a much larger role in CPU usage for many business workloads. Actions (including hardware configuration decisions) that reduce the number of cycles spent waiting for data and instructions to be staged into Level 1 (L1) cache can significantly reduce overall CPU consumption. 

How is this relevant to a discussion of sub-capacity processors? Sub-capacity models are configured with more physical CPs for the same capacity, and those additional CPs provide more processor cache. The bottom row of Table 1 illustrates how much more processor cache is available to workloads running on a sub-capacity z14-523 compared to full-speed z13 and z14 models with similar capacities. 

Table 1 - Comparative Cache Sizes for 213 and z14 models with Very Similar Capacities. 

Larger cache sizes often translate directly into fewer waiting cycles (lower Finite CPI) and thus lower CPU consumption. And now that in a post-z13 world many business workloads benefit from this improved processor cache efficiency, a new and significant justification for considering sub-capacity models is apparent. 

Other Factors Expanding Adoption of Sub-capacity Models 

Historically, due to inherent capacity limitations, larger sites have often considered sub-capacity models to be appropriate only for smaller sites and not applicable for their environments. This arises from the fact that the configuration of a sub-capacity CPC is limited to the number of GCPs that can fit within a single drawer?. But recent z CPC generations have expanded the number of CPs supported by a single drawer. This has grown the potential capacity of a sub-capacity processor by 85% from 21,109 MIPS on a 213 (which supported 30 GCPs in a drawer) to 39,180 MIPS on a z16 (which supports 39 GCPs in a drawer). 

Another factor contributing to the increased maximum capacity available from z16 sub-capacity models is that IBM increased the relative speed of 6-series models to 66% of the speed of the full speed 7-series model (from 56% on the z15). That was another contributing factor to the 39,180 MIPS maximum capacity cited above. To put this in context, this means that the capacity of the largest available sub-capacity model today (z16-639) is larger than a full speed z13-737 from a few years ago. And a z16-6xx sub-capacity CPC has 75% of the capacity of a full speed z15 with the same number of physical CPs. The potential applicability of sub-capacity CPCs has expanded significantly in recent years. 

With this additional awareness, many larger sites are also identifying roles for sub-capacity models for their CPCs dedicated to development and/or test (DevTest) LPARs. CPCs with large numbers of LPARs typically have few, if any, Vertical High logical CPs and thus experience notoriously poor processor cache efficiency. The increased number of physical GCPs provided by sub-capacity models often significantly improves the Vertical CP configuration and thus reduces CPU consumption on these CPCs. 

Sites performing lateral capacity upgrades to new CPC generations are prone to encountering an adverse impact on their Vertical CP configurations and percent of work executing on Vertical High logical CPs. These lateral capacity upgrades result in fewer physical CPs because each CP in newer generations delivers more capacity. This reduction in physical CPs is compounded when upgrades skip a generation (as is often the case). Sub-capacity models may be evaluated as an alternative in these situations to reduce or even eliminate this potential exposure.

The cumulative impact of these factors has put sub-capacity models on the radar of many larger sites that had formerly ignored them. This expanding adoption of sub-capacity models was confirmed by Brad Snyder from the IBM Washington Systems Center at last year's SHARE Performance Hot Topics session in Atlanta. 

Additional Benefits of Sub-capacity Models 

In addition to potential processor cache and CPU efficiency improvements, there are other possible benefits to be aware of. One is that zIIPs (along with all non-GCP processor types) always run at full speed, independent of whether the CPC is a sub-capacity model. So if you have a large and/or growing zIIP workload, you may have the potential for much of your work to continue to run at full speed while you reduce your hardware spend on GCPs. 

Additionally, GCPs run at full speed when System Recovery Boost is active, alleviating concerns that slower CPs might have a negative impact on recovery times. In fact, with more physical GCPs all running at full speed, recovery actions have the potential of completing in /ess time on a sub-capacity model. 

Frank also lists a benefit he describes as ‘taking advantage of the speed of light'. On a full-speed CPC, the cores are available to perform work in every clock cycle. On sub-capacity CPCs, the core ‘goes to sleep’ for some percentage of cycles. For example, if a 601 ran at exactly half the speed of a 701, it would conceptually be ‘asleep’ for every other cycle. 

CPU accounting handles this by not counting the asleep cycles. So, in theory, an instruction that takes 4 cycles on a 701 would also be reported as taking 4 cycles on a 601. However, electrons don't stop moving when the core is asleep. Assume that the elapsed time to retrieve something from memory would be 1000 cycles on the 701. Because of the speed adjustment, that would be reported as 500 cycles on the 601. 

Accordingly, for requests that need to travel a ‘long’ distance in the cache nest, the number of adjusted cycles on a 601 will be lower than on a 701. This results in a small additional bonus for High RNI workloads (that use the nest more heavily). 

Can | Expect My Workload to Execute More Efficiently on Sub-capacity Models? 

For sites not driven by the need for greater granularity in installed capacity, the driving factor for considering a sub-cap model is likely to be potential CPU savings (and thus reduced expense) from processor cache efficiency. So how can you assess in advance whether your workload is likely to experience such a benefit? Let's examine how processor cache metrics can help answer that question. 

LSPR Workload Characterization 

Alogical place to start is to determine the potential size of the savings opportunity, i.e., the magnitude of CPU cycles currently spent waiting for data and instructions to be staged into L1 cache. This can be easily assessed by looking at the LSPR workload categorization rules in Table 2 which indicate how demanding a workload is on processor cache, using the SMF type 113 metrics ‘Level 1 Miss Percentage’ (L1MP) and ‘Relative Nest Intensity’ (RNI). ‘Low workloads are already operating efficiently with your existing cache topology, so there are minimal waiting cycles to reduce, which means less opportunity for savings. If your interest in sub-cap models is driven by considerations other than CPU savings, such as those listed under ‘Additional Benefits of Sub-capacity Models’ on page 5 above, these may still be realized. 

Table 2 - LSPR Workload Characterization, IBM 

On the other hand, ‘High’ and ‘Average’ workloads have more potential to benefit from the enhanced cache configuration of sub-cap CPCs. Previewing the customer case study we will discuss in the next section, that workload had a L1MP of 3.87% and RNI of 0.97, resulting in a workload categorization of Average (nearly High; a RNI of 0.97 is only 0.03 from the High threshold). That indicates a workload that is relatively demanding on processor cache, with lots of waiting cycles and thus significant potential for savings. 

Vertical High Benefit and Quantity of Eligible Work 

Since sub-capacity models have more physical CPs, which provide the opportunity for more Vertical High CPs, there are two considerations for assessing the magnitude of the potential CPU benefit: 

  • How much does your workload benefit from executing on Vertical High CPs?
  • How much of your work currently is not executing on Vertical High CPs, and therefore could potentially benefit from having more Vertical High CPs? 

Viewing Finite CPI (waiting cycles) by logical CP enables us to determine if there a significant difference between work executing on VLs and that running on VHs and VMs, and if so, to quantify that penalty. Figure 1 illustrates this for a sample system where the vertical CP configuration consists of 4 VHs, 1 VM, and 5 VLs. (This analysis typically focuses on VLs, though in some cases work on VMs also has a measurable penalty.)

Figure 1 - Finite CPI by Logical CP (© IntelliMagic Vision) 

A noticeable gap between the two sets of lines (highlighted by the yellow arrow) is a visual indicator of a sizable Finite CPI penalty for the VL CPs, calculated in this example to be 39%. To translate this into CPU savings, it needs to be combined with the contribution of Finite CPI (waiting cycles) to the overall CPI, which can also be derived from SMF 113 metrics. The table on the left in Figure 1 indicates that on this CPC, the Finite CPI (waiting cycles) makes up 51% of the total CPI. This provides the other information we need to calculate the additional CPU consumption for the work running on the VLs. 

Multiplying the 39% Finite CPI penalty for VLs on this system by the 51% contribution of Finite CPI to total CPI results in a total CPU penalty of 20%. Expressed another way, on this system, changes that reduce the amount of work executing on VLs would reduce the CPU consumption for that work by up to 20%. 

Cache performance on VLs is typically less efficient because work executing there represents activity that exceeds the LPAR's guaranteed share derived from its LPAR weight, and thus relies on capacity ‘donated’ by other LPARs who are consuming less CPU than their guaranteed shares. These VLs are normally located elsewhere in the cache nest, not co-located with work running on the VHs for that system but instead on cores that ‘belong’ to the VHs on other LPARs. So not only does work running on the VLs suffer because of their poorer CPI, but this work can also negatively impact work running on the other LPARs' VHs by flushing their data and instructions out of the L1 and L2 caches. Figure 2 illustrates the LPAR topology for this system, with the arrow highlighting that the VLs are located on a different chip than that system's VH and VM CPs.

Figure 2 - LPAR Topology - Placement of VLs (IntelliMagic Vision) 

If there is (as in this case) a sizable Finite CPI penalty on VLs, the remaining question is whether there is a substantial volume of work executing on VLs that is experiencing that penalty. 

Figure 3 on page 9 indicates that in this example there are thousands of MIPS of work executing on VLs during numerous time intervals. Thus, this system reflects a potential opportunity for CPU savings if much of that work were to execute on VHs instead. This could potentially be facilitated through the additional physical CPs provided by a sub-cap model. [Note that in some cases reallocating weights across LPARs can be leveraged to increase the amount of work executing on VHs, though the magnitude of that benefit is typically less than that potentially provided by sub-cap models.] 

Figure 3 - MIPS Dispatched on Vertical Low CPs (IntelliMagic Vision) 

Another way to visualize the size and timing of the potential processor cache efficiency opportunities is through an Engine Dispatch Analysis as seen in Figure 4. This chart compares CPU usage of an LPAR over time to its guaranteed share based on the LPAR'’s weight. 

Figure 4 - Engine Dispatch Analysis (IntelliMagic Vision) 

This view presents four variables: 

  1. The number of physical CPs on the CPC (in yellow, 18 here). 
  2. The total number of logical CPs for the selected LPAR (in blue, 9 here). 
  3. The LPAR’s guaranteed share (in green), which is a function of the LPAR’s weight and the number of shared physical CPs; note that it is variable in this example because IRD (Intelligent Resource Director) weight management is dynamically changing the LPAR’s weight. 
  4. The actual CPU consumption in units of CPs (in red). 

Whenever CPU consumption (in red) is significantly higher than the guaranteed share (in green), as highlighted by the arrows during two intervals, most of that surplus work is executing on VLs. 

Detailed Customer Case Study 

Frank and | have interacted with many customers who have experienced double digit percentage CPU savings through migrations to sub-cap CPCs. Hardly a quarter goes by without one or both of us hearing of another site that made the change and had a positive experience to report. We are aware of only one site that had to fall back from a sub-cap to a full speed CPC, and that was due to their having a sizable amount of Store Into Instruction Stream (SIIS) in their workload. 
For this article | have selected a sub-cap migration example where | had access to before and after SMF data to give us visibility into how these key metrics and thought processes played out. For the customer's perspective on this migration, you are invited to read the ‘Customer Sub-capacity CPC Experience’ article in Tuning Letter 2018 No. 3. 
As identified above, the starting point for analysis was to evaluate the potential size of the savings opportunity. As indicated earlier on page 6, the LSPR workload category for this customer was Average but nearly High (RNI of 0.97), indicating many machine cycles waiting for cache that represent a sizable opportunity for savings. 
Table 3 describes the before and after configurations in this case study, involving a migration from a full capacity z13 (with 9 physical CPs) to a sub-capacity z14 (with 23 physical CPs). Note that both models have very similar MSU capacity ratings, so this is effectively a lateral migration from a hardware capacity perspective. 
Table 3 - CPC Configurations 
The HiperDispatch benefits of the new configuration are immediately apparent in Table 3. Having many more physical CPs means many more VHs (with their 1-1 relationships between logical and physical CPs), and a much higher percentage of the workload executing on VHs. 
In Table 1 on page 4 we saw the significant increase in the total amounts of processor cache by level between the prior configuration (top row of that table), what they would have had if they implemented a full capacity z14 with equivalent capacity (middle row), and the sub-capacity model they selected (at the bottom). Increasing the number of physical CPs in the configuration from 8 or 9 to 23 significantly increased the total amount of processor cache that was available to their workloads. Increased cache sizes result in increased residency time for data in caches and thus better hit ratios. 

Table 4 shows the benefit of the nearly tripling in size of the L1 cache, with the 9% reduction in the frequency of L1 misses between the before and after measurement intervals. Peak interval data for other days showed even larger improvements - up to 15%. When data and instructions are presentin L1 cache, the processor stays productively busy executing those instructions and no accesses to other levels of processor cache are required. 
Table 4 - LSPR L1MP and RNI Metrics and Workload Categorization
The vertical CP configuration for the primary z/OS system on the full-speed z13-709 prior to the upgrade was 6 VHs, 1 VM, and 2 VLs. Figure 5 on page 12 shows the Finite CPI penalty for the non-VHs when the workload was executing on the z13-709 prior to the migration. The work executing on the VM and one of the VL logical CPs, CPs 0C and OE (the pink and purple lines across the top) experienced a 13% higher Finite CPI. Comparatively speaking, this is not a huge Finite CPI penalty, and illustrates that sub-cap migrations can produce significant savings even with a moderate penalty as seen here. 
Figure 5 - Finite CPI by Logical CP (IntelliMagic Vision) 
As indicated in Table 3 on page 10, the migration to the sub-cap model and its increased number of physical CPs and VH logical CPs reduced the quantity of work not executing on VHs from 18% to 4% across the day shift intervals (from 1500 MIPS to 300 MIPS in absolute terms). The increased cache efficiency and reduced waiting cycles now that 96% of the workload was running on VHs on the z14-523 reduced CPU busy from 73% to 52% (see Figure 6) and reduced their monthly peak 4HRA by 22% which generated MLC savings exceeding $1M annually. 

Figure 6 - Before and After CPC Utilization (IntelliMagic Vision) 
To recap, the combination of the LSPR workload category (Average+), a sizable amount of work not executing on VHs, and a measurable Finite CPI penalty indicated a potentially sizable opportunity for CPU savings from improved processor cache efficiency. By optimizing their CPC configuration in order to achieve processor cache efficiencies, this site realized far more capacity from the sub-capacity model, even though the rated capacity was essentially the same as the full capacity models. 

Why Migration to a Sub-cap CPC Can Take More Effort 

Validating that a sub-cap processor could be a viable option in your environment requires ensuring that the slower CP speed will not negatively impact business-critical transaction response and batch elapsed times. E.g., a sub-cap model that runs at 50% of full speed can be expected to double CPU times. However, keep in mind that increases in CPU time are often offset by decreased ‘waiting for dispatch’ times due to the presence of more physical CPs on which work can be dispatched. 
Possible candidates for additional analysis include: 
  • Response times for high CPU single-TCB transactions where doubling CPU per transaction could noticeably degrade response times.
  • Large CICS regions that do not use threadsafe and have high CPU consumption by the QR TCB.
  • Elapsed times for single-TCB, single-threaded, critical path batch jobs that execute in constrained batch windows. (Again, if ‘waiting for dispatch’ delays are a significant component of elapsed time, then the impact of increased CPU may be offset to some degree by the presence of additional dispatch points.) 
There are many potential considerations here. Readers are encouraged to consult David Hutton's “Pitfalls of Non-Traditional Migrations” presentation for a detailed treatment of things to consider when evaluating the ‘fewer, faster’ to more, slower’ change inherent with a sub-cap CPC. 

Other Considerations 

IBM's ZPCR capacity planning tool is not designed to predict how a large increase or decrease in the number of cores or amount of cache could impact a workload. Instead, it addresses the ‘typical’ case where a customer upgrades to a newer CPC but stays in the same speed range. As a result, ZPCR projections should not be expected to capture the dynamics of a migration to sub-cap CPCs. 
Also, it is important to remember that any savings from a migration from a full speed to sub-cap CPC will be a one-time occurrence. If you subsequently upgrade to a new CPC in the same speed range (e.g., z15 6xx to z16 6xx), you should not expect further sub-capacity-related CPU savings. Only if you migrate to a slower speed range again in a future upgrade, e.g., from a 6xx to 5xx model, would you be a candidate for potential additional savings. 

Because sizable benefits from sub-cap migrations have been widely experienced, and the applications of sub-cap CPCs have continued to expand, we encourage every site to be aware of sub-cap offerings and to evaluate whether they have a place in their configuration. In some cases, the answer will be yes, in other cases the answer will be no. But hopefully the days of large sites ignoring sub-cap options and dismissing them without any consideration are becoming a thing of the past. 

References 

If this article has piqued your interest in this topic, the following documents and presentations might provide valuable additional information: 
  • ‘Customer Sub-capacity CPC Experience’ article in Tuning Letter 2018 No. 3.
  • ‘Introduction to CPU Measurement Facility’ article by Todd Havekost and David Hutton in Tuning Letter 2016 No. 4.
  • ‘CPU MF Part 2 - Concepts’ article by Todd Havekost and David Hutton in Tuning Letter 2017 No. 1.
  • ‘CPU MF Part 3 - Optimizing the CPC Cache’ article by Todd Havekost and David Hutton in Tuning Letter 2017 No. 4.
  • ‘HiperDispatch Questions and Answers’ article by Alain Maneville in Tuning Letter 2015 No. 4.
  • GSE UK October 2023 session ‘Should There Be a Sub-capacity Processor Model in Your Future?’, by Todd Havekost and Frank Kyne.
  • SHARE 2022 in Columbus session ‘Pitfalls of Non-Traditional Migrations’, by David Hutton.
  • SHARE 2023 in Atlanta Session 35309, ‘MVSP Project Opening and WSC Hot Topics’, by Brad Snyder and Meral Temal. 

Summary 

We want to thank Todd for another excellent article. All Todd’s articles impart invaluable information to our readers, but this topic is particularly close to our hearts. As Todd said, sub-capacity CPCs are not necessarily right for every customer and every situation. However, there are many customers in that sweet spot where they could fit all their work on a sub-cap CPC, with plenty of room for growth. If you are among that set of customers and you don'tinclude a sub-cap CPC among your potential upgrade targets, we believe you are potentially ignoring a valuable opportunity to not only save money, but also to deliver better-than-expected performance and flexibility.

There is no question that upgrading to a sub-capacity CPC is potentially more work than simply moving to the latest version of whatever you have today. But Todd's article does a great job of helping you determine if the potential benefits are sufficient to justify the additional effort.

If you have an experience with upgrading to a sub-capacity CPC that you would like to share with your peers, please let us know. Of the customers we have worked with, all but one had a very positive experience; but every site is different, and we are sure we and our readers can learn something new from every other customers’ experience. 

0 comments
17 views

Permalink