One common way to quantify and characterize workloads is to use measurements of I/O activity. This is illustrated by the fact that one metric often relied upon when calculating Internal Throughput Rates to measure the impact of processor upgrades is I/O interrupt rate divided by CPU consumption. In the methodology described in the Kathy Walsh “How to Measure that New z15” presentation, one of the two metrics in the “sweet spot” (combining higher accuracy and lower effort) on her chart about maximizing accuracy and minimizing data collection effort is IRATE/GCP USED, i.e., total rate of I/O interrupts divided by consumed CPU capacity expressed in terms of CPC cores.
However, there are a plethora of SMF metrics that report I/O activity across numerous record types including SMF type 70, 72, 74, 78, 42, 23, 30, 6x (VSAM), 92 (zFS), Db2 SMF records, and probably others. We don't have the space or time to cover them all in this article, so we will limit ourselves to the most widely-used ones (types 7x and 42) here.
Many of the fields we will be looking at have similar descriptions in the SMF manual, such as “total number of I/Os” or “start subchannel count”, however when you look at those fields for the same interval they can contain significantly different values.
To assist with our understanding of the counts, we analyzed the values of these metrics from multiple systems in two sites and listed the fields in relative order of the values. This resulting order was consistent across the two sites, confirming that the apparent disparities were not an anomaly. This approach also helped us identify the fields that appeared to be reporting consistent values, and the ones reporting significantly different values (indicating that different categories of I/Os were being included or excluded from the different fields).
The categories of I/Os we have identified so far include:
- Traditional non-zHPF Disk I/Os.
- zHPF Disk I/Os.
- Synchronous I/Os (via zHyperLink).
- Paging Disk I/Os.
- zHyperWrite I/Os to a Secondary Device, and Consistent Reads from a Secondary. These may appear a little different to 'normal' I/Os because the device is not online to the reporting system. However, for the SMF fields we investigated for this article, these I/Os are treated the same as ‘normal’ I/Os.
- “Other” I/Os (Tape, Communication Devices, Graphics Devices, Unit Record Devices, and Character Reader Devices).
- 'I/Os' to PCIE-attached devices such as OSA adapters, zEDC cards (prior to z15), PCIe Cryptographic Co-processors, RoCE Express cards, and so on.
The values shown in Figure 1 on page 4 and throughout the remainder of this article represent cumulative rates across multiple CPCs from one of the sites over a one-day time interval.
Figure 1 - I/O Count Fields from SMF 42, 70, 72, 74, and 78 Records
As you can see, there are significant differences between the different metrics. This immediately raises the question of - when someone asks you for the “system-level I/O rate”, which of the more than 10 I/O-related fields are they referring to? In this example, the total I/O interrupt rate was nearly 52% more than the number of I/Os reported in the SMF70NIO field, and more than double the rate reported in the R723CIRC field in the SMF 72.3 records, so which of those fields should you use? The remainder of this article steps through those fields and explains, as best we are able to, why they contain significantly different values.
SMF 70 Fields
We will begin with the SMF 70.1 record, because that is a core record for many types of analysis, and because it contains the fields used in the IBM processor capacity evaluation methodology mentioned above. Most sections of SMF 70 records are generated at the z/OS system level, though some sections are at other levels (e.g., all LPARs on that CPC, logical CP).
The first fields mentioned in Figure 1 are SMF70SLH and SMF70TPI. SMF70SLH reports the number of times the Second Level Interrupt Handler (SLIH) was entered. This information is retrieved from field PCCASLIH in the PCCA (Physical Configuration Communication Area) control block of each logical processor1. If you are not familiar with MVS interrupt handling and would like to know more, the 1.17.2 Types of interrupts and 1.34 I/O Interrupt processing sections of the IBM Redbook ABCs of z/OS System Programming Volume 10, SG24-6990, will tell you all you want to know.
The SMF70TPI field reports the number of Test Pending Interruption (TPI) instructions. The SLIH always issues a TPI instruction just before it completes, to determine if there are more interrupts waiting to be processed. If the TPI ends with completion code 1, that means there is at least one more interrupt queued, so rather than ending, the SLIH immediately retrieves that interrupt and starts processing it (without incrementing the SMF70SLH counter). This is illustrated nicely in the chart in Figure 2 from Robert Vaupel's outstanding High Availability and Scalability of Mainframe Environments textbook. The SMF70TPI value is retrieved from the PCCASTPI field in the PCCA.
Figure 2 - MVS Interrupt Processing (© Robert Vaupel)
As a result, to determine the total number of I/O interrupts processed by a z/OS system, you need to sum the SMF70SLH and SMF70TPI fields. The sum of these provides the IRATE component used in the IBM IRATE/GCP USED formula referenced above.
IBM ITRR calculation uses SMF70SLH and SMF70TPI values.
Synchronous I/Os
In late 2017, IBM introduced zHyperLink technology that enables qualifying I/Os for supported (short distance point-to-point) configurations to be completed up to ten times faster than “standard” (zHPF, z High Performance FICON) I/Os (in under 30 microseconds, depending on the distance). These I/Os are completed synchronously, maintaining control of the CPU and not resulting in an I/O interrupt. Thus, synchronous I/Os are not reflected in any measurements of I/O interrupts such as those in SMF70SLH and SMF70TPI. Additionally, as you will see, synchronous I/Os are included in some other SMF fields that report I/O counts, but not in others.
In keeping with the best IT traditions, there are (at least) two completely unrelated uses of the term 'Synchronous I/O':
- There are the I/Os issued on a zHyperLink (those are the ones we mention in this article).
- Db2 also uses the term for I/Os that result when a Getpage cannot be satisfied from a buffer pool, resulting in an I/O to the database that is ‘synchronous’ to the unit of work.
When we talk about Synchronous I/Os in this article, we will always be referring to I/Os issued over a zHyperLink. From the perspective of this article, the database reads that Db2 calls 'Synchronous I/Os' are the same as any other disk I/O, and we treat them accordingly.
For more information about Synchronous I/Os and zHyperLink, refer to the 'Meet the Future of I/O - zHyperLink' article in Tuning Letter 2018 No. 2.
Another interesting field from the SMF type 70 records is SMF70NIO which is described in the SMF manual as “the number of I/Os for this CPU.” It is based on information from an Object Code Only (OCO) IOS control block.
Figure 3 - I/O Count Fields from SMF 70 Records
The source for the SMF70NIO field is always updated for every Start Subchannel instruction (and only Start Subchannel instructions) the I/O Supervisor issues, without regard to what that device is. Therefore. it includes I/Os to channel-attached devices such as disk, tape, unit record equipment, and so on. Despite this, as you can see in Figure 3 on page 6, there is a significant difference between the value presented in the SMF70NIO field and the total interrupt rate as reported by the SMF70SLH + SMF70TPI fields. Some possible reasons for the difference are:
- The SMF70SLH+SMF70TPI fields include interrupts from PCIE devices (OSA adapters, for example), but the 'I/Os' to those devices do not use the Start Subchannel (SSCH) instruction, and therefore are not included in the SMF70NIO field.
- I/Os issued by programs that use Program Controlled Interrupts (PCI), such as the MVS Program Loader, paging I/Os, XCF CTC communication, and other essential system tasks2 can generate multiple interrupts per Start Subchannel.
- Many types of activities performed by modern storage subsystems (e.g., flash copies, state changes) that generate interrupts that are not associated with I/Os from programs. Given the increasing amount of replicated data that exists in customer configurations (with Metro Mirror, Global Mirror, Metro Global Mirror, FlashCopy, SafeGuarded Copy, and so on), these types of interrupts are likely to grow as a percent of the overall number of interrupts as time goes on.
– An example of a scenario where you might have much larger numbers of interrupts than there are started I/Os is a small Sandbox system. Such a system might not issue many I/Os, but it will still receive state change interrupts from volumes that are offline (but physically connected) to that system.
Note: Synchronous I/Os are not included in SMF70NIO, but because they do not generate interrupts, they are not included in SMF70SLH+SMF70TPI either. Accordingly, Synchronous I/O activity is unrelated to any potential difference between the number of I/Os and the number of interrupts.
Unfortunately, there are no detailed metrics about the types of interrupts that are being handled by the SLIH, so there is no easy way to definitively explain the difference between the number of interrupts and the number of started I/Os. The information above should be some help.
You also need to consider how important it is to know why the numbers are different. If you are looking for an indicator of the total volume of work being processed by a system, particularly for performing before-and-after comparisons, the number of interrupts (as used by IBM in their ITRR calculations) might be the better number to use. If you are looking for an I/O metric from SMF 70 records that correlates to business workloads, SMF70NIO might be a better choice than SMF70SLH+SMF70TPI.
In any case, understanding that there are differences between the total number of 'I/O interrupts' and the number of started I/Os, and some of the reasons for those differences, might help you select the metric that is the most appropriate for the use you have in mind.
SMF Type 23 Fields
SMF record type 23 is titled “SMF Status” and as the name suggests it provides general information about SMF processing for both legacy and logstream approaches (e.g., records written, buffers used). But it also has some overall system information that some sites use to compare relative CPC capacities. Two of those fields are interesting in the context of this article - the SMF23NIO and SMF23NID fieldsa. These fields get their data from the same source as SMF70NIO.
If you compare the SMF23NID and SMF70NIO fields you might see differences in some intervals. In our experience, the differences we saw were due to the type 23 and type 70 records being produced at different times. By default, the type 23 records are produced every hour, based on the time of the IPL. Type 70 records are generally produced every INTVAL minutes, on the SYNCVAL time (where INTVAL and SYNCVAL are both defined in the SMFPRMxx member). However, you can tell SMF to create the type 23 records based on the INTVAL and SYNCVAL settings - you do this by specifying STATUS(SMF,SYNC) in your SMFPRMxx member. Setting up your SMFPRMxx in this way should result in nearly identical values in the corresponding fields in the type 23 and type 70 records.
a. Both of these fields are currently described in the SMF manual as “total number of I/Os for this interval” - in fact, the SMF23NIO field contains cumulative values, so the description of that field is a little inaccurate - hopefully that will be addressed in a future update to the SMF manual
SMF 78 Fields
SMF 78.3 records are described as reporting “I/O Queueing Activity”. The section with global I/O measurement counts is generated at the IOP (Input Output Processor) level.
Terminology
Depending on where you look in IBM documentation, you will see the term IOP (I/O Processor) or SAP (System Assist Processor). They are two names for the same thing. Because the term 'IOP' appears to be used more frequently in the RMF and SMF documentation, we will use that term in all cases in this article.
The IOP is a system Z Processor Unit (PU) mostly dedicated to handling the starting and completion of I/O requestsa. In the 'old days', the functions carried out by IOPs today were collectively called the 'Channel Subsystem'. There is limited IBM documentation on exactly which 'I/O' requests are handled by the IOPs. From the perspective of this article, however, the important thing is that information about the total number of Start Subchannel requests and I/O interrupts handled by each IOP are reported in SMF type 78.3 records.
If you are really interested in the role of the IOP, refer to IBM Redbook ABCs of z/OS System Programming, Volume 10, SG24-6990, last updated in 2018, or an even older IBM paper from 2007, Input / Output: A White Paper, by John Kettner.
a. The IOP is also involved in Asynchronous zEDC requests, some Storage Class Memory (SCM) requests, and Server Time Protocol (STP) processing.
The R783IPII field (“Number of processed I/O interrupts”) is based on data retrieved directly from each IOP, and reflects the number of I/O interrupts handled by the IOP. As seen in Figure 4, the values it reports are very close to the sum of the SMF70SLH and SMF70TPI fields seen above. We've also seen a few examples where the R783IPII number was a little higher than the SMF70SLH + SMF70TPI values, but always very close.
Figure 4 - I/O Count Fields from SMF 70 and 78 Records
Another interesting field in the 78.3 record, R783IIFS, is also based on data retrieved directly from each IOP. It is the number of start-subchannel (SSCH) requests that were started for the reporting system on that IOP. As you can see in Figure 4, the number of started I/Os as reported by the IOP(s) (field R783IIFS) is very close to the SMF70NIO field. The values in the SMF70NIO field come from an IOS control block while the R783* numbers come directly from the IOP, so some small difference is to be expected. They are also retrieved by the Data Gatherer at slightly different times - another possible explanation of small differences. It is also possible that there are some requests (a very small number, based on our data) that are included by either IOS in its SLH and TPI counters, or by the IOP, but not included by the other. We were unable to identify exactly which types of requests those might be.
Since synchronous I/Os are driven directly by the general purpose CP or the zIIP, and they do not result in I/O interrupts, synchronous I/Os are also not included in either of these IOP fields.
SMF 72 Fields
The R723CIRC field in the SMF 72.3 record reports the total non-paging DASD I/O Start Subchannel count at the WLM service class level. The Data Gatherer uses a WLM service to gather this information, and WLM in turn gathers that information from the OUXB and ENCB control blocks, both of which are maintained by IOS.
As you can see in Figure 5, there is a significant difference between the sum of the R723CIRC fields for every service class and the value of the SMF70NIO field. We mentioned previously that the SMF70NIO field contains information for many I/O categories (for example, tape I/Os, channel-attached communications devices, paging I/Os, and so on), while the R723CIRC field does not include paging I/O or I/Os to non-DASD devices. While the SMF70NIO field is intended to provide a system-level view of activity, the R723CIRC field provides a view that is more application/workload centric. These are factors in explaining why the R723CIRC value is lower in Figure 5 than the SMF70NIO or R783IIFS fields.
Figure 5 - I/O Count Fields from SMF 70, 72, and 78 Records
SMF 74 Fields
SMF 74.1 records report a broad range of I/O metrics including counts at the device level. The SMF74SSC field reports the start subchannel count, including SSCH and RSCH (resume) instructions. It is populated from field ECMBSschRschCount in the device ECMB (Extended Channel Measurement Block). By default, ECMBs are only created for disk and tape devices. The CMB parameter in the IEASYSxx member of Parmlib can be used to extend this to include other (mainly older) types of devices such as Unit Record devices, Character Reader devices, and so on. SMF74SSC only captures I/Os for devices that have associated ECMBs, which may help explain why its value is less than SMF70NIO.
Traditional SMF 74 fields like SMF74SSC only include metrics for (legacy) asynchronous I/Os. To cater for the 'new' synchronous I/Os, an entire set of SMF74S* fields (including SMF74SQR and SMF74SQW) that report exclusively on synchronous I/Os has been added to the end of the Device data section of the SMF 74.1 records. SMF74SQR reports the number of successfully completed synchronous I/O read requests. “Successfully” differentiates from reads initiated as synchronous I/Os but that had to be re-driven as standard zHPF asynchronous I/Os because the requested data was not found in the disk subsystem cache (as required for a successful synchronous I/O).
SMF74SQR (reads) and its companion SMF74SQW (for writes) are populated from fields ECMXSynReadCnt and ECMXSynWriteCnt in the ECMX (Extended Channel Measurement Block extension). The ECMX control block is a control block extension introduced for synchronous I/Os.
RMF only includes metrics in the SMF 74.1 records for the types of devices specified in the ERBRMF00 member (see the sample member in Figure 6).
Figure 6 - Excerpt from Sample ERBRMF00 Parmlib Member
In addition to adding these SMF 74.1 fields, Figure 7 now also includes a column indicating whether metrics include synchronous I/Os. (The site from which I captured this sample data has not yet implemented synchronous write I/Os.)
Figure 7 - I/O Count Fields from SMF 70, 72, 74, and 78 Records
SMF 42 Fields
As you might expect, the DFSMS-owned type 42 records are more focused on data set, storage class, and volume-level activity than the Data Gatherer-owned type 7x records which are generally more focused on system-level activity. Also, while most of the type 7x fields described above include I/Os to multiple types of devices, the type 42.5 and 42.6 records contain information about only disk activity. This helps explain the reason SMF74SSC (which includes at least tape and potentially other non-disk devices) reports higher values than the SMF 42 metrics, as you can see in Figure 8.
Figure 8 - I/O Count Fields from SMF 42, 70, 72, 74, and 78 Records
The SMF 42.5 records contain sections with I/O metrics at the storage class and volume levels and are created at the end of each SMF interval. The SMF 42.6 records contain similar information at the data set level. For purposes of this discussion, we are combining data from both 42.5 and 42.6 sources to present a more complete picture of total disk I/Os. These metrics are derived when IOS (I/O Supervisor) in combination with the device driver (which, for synchronous I/Os, is Media Manager) requests an SSCH I/O operation to a device.
These SMF 42 metrics differ from the “legacy” SMF 74 metrics described above in that synchronous I/Os are included in many of the metrics. Additionally, the SMF 42.6 records include zHyperWrite I/Os to, and Consistent Reads from, Metro Mirror secondary volumes.
The primary field of interest to us is S42DSION (“total number of I/Os”) for each data set. To get a complete count of I/Os, this value is added to S42VDION (VTOC), S42VXION (VTOC Index), and S42VVION (VVDS).
S42VSION - System (or, ‘Uncaptured’) I/Os
One other metric that can be included to derive a complete total is S42VSION, “system I/Os”. This field and a related section in the 42.5 record were created by APARs OA55709 and OA55710 (in 2019). These can also be thought of as “uncaptured” I/Os, as they reflect atypical situations where the control blocks DFSMS uses to categorize the I/Os do not exist. Examples include RMF requests to retrieve DSS or FICON director metrics, the standalone “device release” to end the hardware reserve for a device, and copy services commands to control advanced function operations that are unrelated to data stored on the device. z/OS systems staff will recognize the parallels with CPU capture ratios, which quantifies CPU that cannot be assigned to a specific service class.
Synchronous I/Os are included the S42__ION metrics described above and are also tracked separately in the Synchronous I/O section of the SMF 42.5 and 42.6 records. S42SNRDT and S42SNWTT capture the Synchronous I/O Read and Write Attempts. As explained above, for a synchronous I/O to be “successful” the data must be present in the cache of the storage controller (“cache hit”). S42SNROS and S42SNWOS capture the numbers of successful synchronous I/O reads and writes, respectively. The differences between those values and S42SNRDT and S42SNWTT (“attempts”) quantify the I/Os that had to be re-driven as standard zHPF asynchronous I/Os. In my sample data, 86% of synchronous I/O attempts were successful.
In the midst of all the challenges with accounting for differences in the various I/O counts, it was very encouraging to observe that the counts of synchronous reads reported by SMF74SQR and S42SNROS were effectively identical!
References
As you have probably concluded by now, we wish there was more documentation to help you understand exactly what is or is not included in the SMF fields discussed in this article.
We found the following documents to be helpful:
- IBM Manual, z/OS MVS System Management Facilities, SA38-0667.
- IBM Redbook, ABCs of z/OS System Programming Volume 10, SG24-6990.
- IBM Paper, Input Output: A White Paper, by John Kettner.
There is also helpful information in the text of some related APARs. And the SMF mapping macros in the SYS1.MACLIB data set sometimes contain a little more information than is provided in the SMF manual.
Editor's Note: We are all used to seeing fields in SMF records that are specifically for IBM's own use - in fact, there are even entire subtypes that are described as “This information is for IBM internal use only” in the SMF manual. However, for SMF fields that are labeled something other than Reserved or IBM internal use, I would encourage IBM and all vendors to provide the information customers need to be able to use that data correctly. Information that is obvious to the developer that is responsible for a particular SMF record type isn't always as clear for those of us that are limited to the description in the SMF manual or the mapping macro.
Summary
I (Frank) want to start by thanking Todd for his really outstanding work on this article. Every time I sent Todd a rhetorical question, he immediately came back with an answer or a suggestion based on his analysis of real customer data. I have encountered situations in the past where the values of two fields didn't make sense, but I've never seen so many I/O-related fields laid out in such a structured manner, and based on real world SMF data.
Both Todd and I would have loved to be able to explain, down to the single I/O, every difference between every pair of fields. We didn't achieve that, but we did learn a great deal from the data and the insights provided by IBM's IOS and DFSMS experts. Even if we can't definitively explain every last difference, just knowing that there are differences is very valuable. As someone who loves playing with SMF data, and helping readers understand the values they see, I believe this article will be invaluable.
Apart from the intended purpose of helping readers better understand these related fields, I believe this article holds another very important lesson for us. Regardless of whether you have 4 years or 40 years of experience, actually looking into your SMF data can deliver invaluable insights and lessons. It is all well and fine to know the description in the SMF manual, but to really understand what is happening under the covers, you need to look at the data and consider what it is telling you.
At the start of this exercise, I just assumed that the number of interrupts seen by a system would be roughly the same as the number of I/Os that were started by the system. But when Todd showed me data from different systems, where the number of interrupts ranged from being 7% more than the number of I/Os to being nearly 50 times more, I was forced to realize that my 'understanding' of interrupt processing was far too simplistic. I am still far from being any sort of expert on this aspect of z/OS performance, but at least now I know what I don't know, and I'm on a mission to address that shortcoming. And that wouldn't have happened without Todd's enthusiasm and hard work (and patience!) and the great support we received from the Development folks in IBM.
If you want to become the next Todd or the next Cheryl, this article holds a very valuable lesson for you - never stop digging and asking 'why' until you have an answer that makes sense. I hope you enjoyed it and found it valuable.