Written by Todd Havekost on October 24, 2022.
As parallel sysplexes have evolved, the amount of communication and sharing across the systems has increased dramatically. This communication is facilitated by two key components:
- The Coupling Facility, which maintains shared control information (e.g., database locks), and,
- XCF (Cross-System Coupling Facility), which facilitates point-to-point communication by sending messages to other members of a group (or expressed another way, to peer address spaces on the same or a different system).
Coupling Facility metrics are relatively well known, but XCF metrics (from RMF 74.2 records) have typically received less visibility. However, in modern sysplexes, dozens of z/OS components and subsystems leverage XCF, so these metrics will be the focus of this article and one in the next Tuning Letter issue.
We suggest that this be considered the second article in a 3-part series on XCF. Article #1 was ‘XCF Transport Class Simplification’ in Tuning Letter 2020 No. 2, where Frank provided detailed information about the nuts and bolts of how XCF signaling works. Readers will find that material provides a strong foundation for the content to follow in the rest of this series.
In the past, XCF required a good bit of configuration attention, especially in the area of defining transport classes. But in z/OS 2.4, IBM implemented intelligence that eliminated the need for separately managed transport classes at most sites, and minimizing the overall care and feeding required for XCF by systems programmers. Article #1 also explained that enhancement in detail (thus its article title).
Customer experiences with those transport class enhancements have been very positive, with customers finding that the changes deliver at least as good performance and eliminate issues arising from out-of-date class definitions. There are two related items that we want to draw your attention to:
- HIPER APAR OA60480 addresses real storage shortages by moving some XCF messages into 64-bit (above-the-bar) storage.
- IBM recommends using MAXMSG values not exceeding 6000.
Frank mentioned in article #1 that new Path Usage Statistics fields had been added to the RMF 74.2 records, and indicated his intention to address those in a future Tuning Letter article. RMF support for that data was added with APAR OA61101. The next article (#3) in this series will focus on understanding and interpreting those new metrics using actual data from multiple customers.
Benefits of Monitoring XCF Message Volumes
This article will focus on potential CPU optimization opportunities that can arise from effectively managing XCF message volumes. XCF is exceptionally good at reliably delivering messages at very high volumes. Frank and I have seen XCF message rates in a sysplex exceeding 250K messages per second.
But XCF is so good at its job that it is often taken for granted. The downside of that is that inefficient or unusual behavior often goes unnoticed. Sending and receiving high volumes of messages drives CPU consumption, both for the XCF Address Space (XCFAS) and the address spaces processing the messages. (Note that unnecessary messages also drive Coupling Facility CPU consumption and increase utilization of the XCF signaling infrastructure that typically uses Coupling Facility paths.) As a result, system configuration decisions (or system bugs) that generate unnecessarily high message volumes can waste a significant amount of CPU, as well as require surplus XCF path capacity.
XCFAS is often one of the highest CPU-consuming system tasks, so we recommend tracking its CPU consumption. The most common way to do this is via SMF type 30 records; however, GDPS sites may need to assign XCFAS to a dedicated WLM report class because GDPS suppresses creation of SMF type 30 records for XCFAS. (Since the RMF 72 records generated by WLM for report classes are less voluminous and thus easier to manage than SMF 30 address space interval records, defining dedicated report classes for other
high-CPU-consuming started tasks (e.g., GRS, DFHSM, TCPIP) is a practice commonly utilized by sites that effectively optimize CPU consumption.)
Another reason that XCF message volumes are often ignored is that the RMF XCF reports are prime examples of the limitations of static reports. Comprehensive reporting on point-to-point message volumes between many systems often produces unwieldy and effectively unusable reports (Frank's description is “user-hostile”). Frank reported he has seen a customer's RMF XCF report for a single day that was nearly five million lines long.
Reporting that provides sysplex views and dynamic navigation capabilities to interact with data found in higher-level views can help sites easily identify any opportunities relating to XCF message volumes that may exist. Examples in this article will be illustrated using IntelliMagic Vision.
Identifying Drivers of High Message Volumes
A good starting point for this analysis is a sysplex view of message volumes sent by each XCF group - XCF groups represent related instances of a software component or function (e.g., GRS or CONSOLE). Address spaces that connect to a given XCF group in order to communicate with their peers are called ‘members’ of the group.
From an XCF perspective, all communication in an XCF group is point-to-point. For example, if member AAA1 of group GRPAAA wants to send a message to three other members of that group (AAA2, AAA3, and AAA4), that would be reported by RMF as three Message Sent events for AAA1, and one Message Received event for each of AAA2, AAA3, and AAA4.
Based on this structure, and with the right reporting tools, you can view XCF activity at the sysplex level, at the individual system level, at the group level, and down to the individual member level. You can also see how many messages were sent and how many were received - this provides invaluable insight into the role of each member within an XCF group. Let's start with a look at which groups are generating the most messages across the entire sysplex, as seen in Figure 1 on page 5.
Figure 1 - XCF Messages Sent by XCF Group name (© IntelliMagic Vision)
Some group names are self-explanatory - for example, SYSGRS reflects messages sent by the GRS (Global Resource Serialization) address spaces. Other groups can be understood by accessing the address space field from the XCF 74.2 record. The highest message-generating group in Figure 1, SYSBPX, correlates to the OMVS address space (Unix System Services). In this case, you can see that the SYSBPX group peaks at about 53,000 messages sent per second.
The next step would be to understand the distribution of messages across the systems in the sysplex - this is shown in Figure 2 on page 6. You can immediately see that there are big spikes in the number of messages sent from two of the systems at 6:00 AM on both days. OMVS uses the system ID as its member name when connecting to the SYSBPX group, thereby making it easier to identify each member.
Figure 2 - XCF Messages Sent by XCF Member name (© IntelliMagic Vision)
The two systems that are spiking at 6:00 AM are H009 and H019. Selecting only those two systems and narrowing the time interval to a single day makes it easier to see that each system is generating about 14,000 signals per second during that interval as shown in Figure 3.
Figure 3 - XCF Messages Sent by XCF Member name for selected systems (© IntelliMagic Vision)
If you are going to perform a tuning exercise, your next step would probably be to confirm our theory that large message rates drive higher CPU consumption in the XCFAS address space. Figure 4 (extended back over 2 days) shows that CPU usage for the XCFAS address spaces on these two systems (particularly during the big spikes) maps very closely to the
message volumes seen in Figure 2 on page 6 and Figure 3 on page 6 above.
Figure 4 - XCFAS CPU usage by System ID (© IntelliMagic Vision)
Figure 5 on page 8 combines OMVS and XCFAS CPU on these two systems (again shortened to a single day) and shows an increase of 40 MSUs from steady state, reflecting the CPU consumed processing the spike in messages. Prior experience indicates one possible driver of XCF SYSBPX message and XCFAS and OMVS CPU spikes is having the OMVS file system mounted on one system and performing file maintenance activity on a system other than where the file system was mounted.
Figure 5 - CPU Usage by address space name for selected System IDs (© IntelliMagic Vision)
The next step would be to work with the team responsible for OMVS to determine what activity is occurring at 6 AM, which file systems are being affected, and determine if those activities could be directed to the system that owns those file systems. Alternatively, you might want to investigate why those file systems are not exploiting the zFS capability to have them mounted in R/W mode on every system in the sysplex.
Special Cases: IXCLO* Groups
Along with the potential for CPU reduction, visibility into XCF message volumes can also provide insights into other behaviors in your environment. Such is the case for the group responsible for the second highest message volume from Figure 1 on page 5, IXCLO006. Exploring the associated address space names for this XCF group indicates these messages are being generated by the IRLMs belonging to a high-volume Db2 data sharing group. Martin Packer explains in a blog post titled False Contention Isn't A Matter Of Life And Death that all lock structures have an IXCLO* group associated with them, and that the purpose of the group is to resolve potential lock contention situations. The correlation between the volume of XCF messages for that group (Figure 6 on page 9) and CF requests encountering lock contention for its corresponding Db2 lock structure (shown in Figure 7 on page 9) is apparent.
Figure 6 - XCF Messages Sent (© IntelliMagic Vision)
Figure 7 - Requests encountering lock contention (© IntelliMagic Vision)
Viewing address spaces sending XCF signals within IXCLO* groups, and excluding the common Db2 IRLM and MQ MSTR address spaces (Figure 8 on page 10), identifies other address spaces utilizing lock structures, including the higher volume GRS (mapping to structure ISGLOCK) and SMSVSAM (structure IGWLOCK*).
Figure 8 - XCF Messages Sent by address space name for Group IXCLO* (© IntelliMagic Vision)
Frank encountered a scenario where messages sent by one of the IXCLO* groups (IXCLO06B) accounted for 45% of all XCF messages, hundreds of millions of messages per day (Figure 9). The 7x relationship between that message volume and those for the IRLM XCF group for that Db2 data sharing group diverged widely from those in the other Db2 data sharing groups (where they were roughly equal).
Figure 9 - XCF Messages Sent by XCF Group
Comparing the IXCLO* message volumes to the rates of contention in the lock structures showed 3 XCF messages per contention in the other Db2 groups, and 300 messages per contention for the IXCLO06B group. Eventually, this was determined to be a bug and was addressed. Frank points out that in this case, the largest XCF consumer was generating
roughly 100x more messages than its peers in the other two Db2 data sharing groups. This level of activity was consuming more z/OS and CF CPU capacity than it should have been, which in turn increased the software bill. However, without good visibility into XCF message volumes this activity went unnoticed for a long time, wasting both CPU time and money.
Special Cases: ISTXCF Group
One other XCF group that frequently generates unnecessarily large message volumes, and thus represents a sizable CPU savings opportunity, is ISTXCF, used by VTAM and TCP/IP networking. On the sysplex in Figure 1 on page 5, the ISTXCF group doesn't make the top 10 (or the top 40 for that matter). But for the sysplex in Figure 10, it generates far more messages than any other group.
Figure 10 - XCF Messages Sent by Group name (© IntelliMagic Vision)
TCP/IP utilizes XCF's messaging capabilities in three ways:
- One is to maintain awareness of the health of other TCP/IP instances within the sysplex.
- A second is to determine workload levels of other systems in the sysplex via WLM services. Both of these reflect low volume state-related communication.
- The third possible use by TCP/IP of XCF messaging is to send actual IP traffic between systems. As you might guess, this use is what generates high XCF message volumes.
Figure 11 shows a slide from Gus Kassimis' SHARE presentation. The slide shows IBM’s recommendation to use VIPAROUTE instead of XCF for connection routing due to the much lower CPU overhead.
Figure 11 - IBM recommendation for configuring TCP/IP traffic
The ‘User Experience’ article in Tuning Letter 2015 No. 1 contained a description of our experience at my previous employer where a jump in XCFAS CPU was traced back to a spike in XCF message volume driven by group ISTXCF. Collaboration with the Network team identified that the Sysplex Distributor configuration had been changed to use XCF instead of VIPAROUTE. When corrected, this saved 600 MIPS across the XCFAS and TCP/IP address spaces.
This is an easily implemented configuration change, but we still see sites using XCF instead of OSAs/HiperSockets/SMC, in many cases unaware of the XCF message volumes (and associated CPU) resulting from TCP’s use of XCF, or of IBM's recommendation.
Special Cases: GRS Messages
One system in each sysplex is responsible for coordinating GRS lock contention information gathering and dissemination across a sysplex. That system is called the Contention Notification System (CNS). The first system to be IPLed into the sysplex will normally have that role. As a result, in GDPS environments one of the K systems is often the CNS. When that system is subsequently IPLed, the other systems in the sysplex race to take over that role.
However, the IBM recommendation is to assign the CNS role to the system that does the most batch work, which ideally will have plenty of capacity to handle the CNS role in a timely manner. Figure 12 shows a distribution across systems where the GDPS K system is sending a sizable portion of the GRS messages.
Figure 12 - GRS message distribution across systems
Good visibility into XCF message volumes will quickly identify this undesirable solution, which can be easily reset to a suitable system by issuing the SETGRS CNS=sysid,NP command. Ideally, you would add this command to the COMMNDxx member of Parmlib so that the CNS function always ends up on the desired system regardless of which system was IPLed or the sequence of system IPLs.
Frank leverages good visibility into XCF message volumes in his sysplex reviews to identify opportunities for efficiency improvements. He looks for unusual behaviors such as:
- Regularly spaced spikes.
- The same exact message rate from every system in every interval.
- Spikes of tens of thousands of messages per second in a group that otherwise generates minimal message volumes.
An example of anomalous behavior is seen in Figure 13 on page 14 with very spiky message volumes for groups IXCLO009 (associated here with the GRS ISGLOCK structure) and SYSENF (Event Notification Facility).
Figure 13 - XCF GRS-related message volumes
In this example, the message rate for the GRS contention XCF group was jumping from around 100 messages per second to around 2000 per second, and the SYSENF message rate was increasing from around 200 per second to around 1000 per second. But what really caught Frank's eye was the consistency - this was happening every hour on the hour. This particular example was discussed in more detail in the ‘Inspector Clouseau Meets GRS’ article in Tuning Letter 2018 No. 1. The root cause here was a batch job that was put in place to help investigate a GRS problem, but when the problem was resolved no one remembered to remove the job. This can easily happen, considering how busy everyone is these days. But the important point is that this anomalous behavior would have been spotted if someone had been looking at the XCF reports.
References
This article only scratched the surface of the many interesting insights that can be obtained from XCF usage information. For more information on the examples we included, plus many other scenarios, the following documents might prove helpful:
- ‘TCP/IP and XCF’ User Experience article in Tuning Letter 2015 No. 1.
- ‘Inspector Clouseau Meets GRS’ article in Tuning Letter 2018 No. 1.
- SHARE in Seattle March 2015 Session 16742, Sysplex Networking Technologies and Considerations, by Gus Kassimis.
- Martin Packer blog post, False Contention Isn't A Matter Of Life And Death.
- IBM Redbook System Z Parallel Sysplex Best Practices, SG24-7817.
- IBM Manual z/OS MVS Setting Up a Sysplex, SA23-1399.
- IBM Manual z/OS RMF Report Analysis, SC34-2665.
Summary
Dozens of critical system components depend on XCF to reliably deliver point-to-point messages to their peers in a sysplex. It is extremely effective at delivering high volumes of messages. In fact, it is so effective that unproductive message volumes and the associated inefficiencies often go unnoticed. Good visibility into this data can enable sites to optimize their environment and reduce CPU consumption and possibly software expense, particularly in an environment that is using the increasingly pervasive TFP licensing model.
If you have experiences with extracting valuable information from XCF you think would be valuable to your peers in other z/OS installations, please let us know.
In the next Tuning Letter, we will be exploring the new XCF Path Usage Statistics to determine the implications for configuring path capacity and other considerations. Please drop back to see what additional valuable information is lurking in XCF’s SMF records.