AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

AIOps: Performance and Capacity Management

Members of this community will discuss end to end near-time collection, curation and reporting for simplified performance, cost and capacity management

 View Only

Learning From SMF - MQ Statistics

By Camila Vasquez posted Mon April 28, 2025 08:47 AM

  

Written by Todd Havekost on May 21, 2020.

The mainframe platform produces an incredibly rich set of metrics that exceeds any other. This measurement data can provide a tremendous learning opportunity, both for how various elements of the z/OS ecosystem operate in general, and specifically how those technologies are actually operating in your own environment. 

This learning value applies across the entire spectrum of expertise. For novices who are either relatively new to the platform or who are exploring outside their areas of specialization, learning is accelerated through having easy access to SMF data.

To share a personal example as someone who until recently had minimal exposure to cryptography, having easy visibility into those metrics jump-started my learning. I was quickly introduced to the wide variety of types of crypto operations. I also discovered (for example) how to identify the amount of general purpose CPU time consumed waiting for synchronous calls to the CPACF (Central Processor Assist for Cryptographic Function) co-processor to complete.

Experts can also deepen their understanding through examining data. Last summer I attended sessions at the IBM TechU conference in Berlin that were presented by Martin Packer, an IBM performance specialist from the UK. He presented three very informative sessions that were all based on a single idea, namely, “see where the [SMF] data leads us and what it teaches us.” He applied that pursuit to three different areas: Db2 Distributed Data Facility (DDF), Coupling Facility, and CPU performance at the level of 
individual logical and physical CPs.

For both novices and experts, learning is greatly expedited by having easy visibility into SMF data, ideally with minimal effort expended to mine the data, so that attention can be focused solely on exploring, analyzing, and learning. And recapping an earlier point, no matter what our area(s) of specialization, it is likely that we are all novices in many other aspects of the z/OS infrastructure where we could quickly learn through easy visibility into SMF data.

This is the first of a group of articles designed to introduce you to the types of insights that are available through SMF data into various areas across the z/OS infrastructure. This initial article will focus on MQ, IBM's widely used enterprise messaging solution. Hopefully you will find insights that will help you better understand how MQ functions in general, and you may be motivated to investigate MQ metrics in your environment. I want to thank IBM's Lyn Elkins for reviewing this article and providing very helpful feedback. 

Brief Introduction to MQ and MQ SMF Data

IBM MQ is an enterprise messaging middleware product that enables chunks of data (“messages”) to be sent asynchronously between loosely coupled applications that may reside locally or on separate platforms across a network. MQ manages and organizes these messages into “queues”.

In a messaging environment, each program that makes up part of an application performs a well-defined, self-contained function in response to a specific request. To communicate with another program, a program puts a message on a predefined queue. The other program retrieves the message from the queue and processes the requests and information contained in the message. We can think of message queuing as a highly flexible style of program-to-program communication.

Of course, this just scratches the surface of how MQ operates. Readers not familiar with MQ and interested in more information are encouraged to consult the IBM MQ Product Overview manual (listed in the References section at the end of this article), which provides a general overview of MQ, as well as a section specifically about MQ on z/OS. 

There are two primary types of MQ SMF records, MQ Statistics (SMF type 115) and MQ Accounting (SMF type 116). Statistics records report metrics at the MQ queue manager (or “subsystem”) level, including by buffer pool for buffer-related metrics. Statistics records are produced periodically at a time interval specified by the MQ STATIME system parameter, or at the SMF global accounting interval if you specify zero for STATIME. These records are lightweight and have negligible CPU cost, so sites typically generate MQ Statistics records on an ongoing basis.

MQ Accounting records provide a much greater level of detail, including data by queue name, connection type, and connection (address space) name. This results in a volume of SMF data that leads many sites to only generate this data periodically.

Readers familiar with Db2 SMF data may recognize this organization of MQ SMF data into Statistics (at the subsystem level) and Accounting (by connection type and name). (To further highlight the similarities, the SMF record layouts and field names for Log Manager data in the Db2 SMF type 100 and MQ SMF type 115 records are almost identical.)

The scope of this article is to introduce the types of insights available through MQ Statistics SMF type 115 records. A future Tuning Letter article will examine MQ Accounting SMF type 116 data. Visibility into MQ Statistics data can help profile your workload, and may also point to issues with queue manager components such as buffer pools, logging, and virtual storage. Investigating other types of problems will often require detailed data only found in the MQ Accounting records. 

MQ SMF Record Layouts

Readers used to easily locating SMF record layouts in the z/OS SMF manual will likely be disappointed that the MQ SMF record layouts are currently not available online. Instead, IBM sites (like this one: www.ibm.com/support/knowledgecenter/SSFKSJ_9.1.0/com.ibm.mq.mon.doc/q038240_.htm) contain layouts for the headers and selected portions of some record segments, but refer the reader to Assembler mapping macros and C file headers for the remaining fields. 

The Assembler macros can be found in the CSQDQ* members of the MQ hlqual.SCSQMACS product data set, and C headers are located in hlqual.SCSQC370(CSQD*). As an alternative, Pacific Systems Group has done the community a service by gathering the macro DSECTs on their site at www.pacsys.com/smf/smf115.htm. 

Unleashing the Learning Value of MQ SMF Data

As we would expect, MQ Statistics records provide hundreds of metrics quantifying the operation of the MQ infrastructure. But there can be significant barriers to unleashing the learning and operational value of this or any type of SMF data, including the need to learn unfamiliar tooling that may be siloed by area, or having the time and expertise to develop in-house programs to process and analyze that data. Instead, deriving insights from this data can be greatly aided by having an intuitive interface to easily explore the data and dynamically drill down to view relationships between various metrics.

The sample reports and views in this article have been created using one such interface, IntelliMagic Vision. Alternative approaches for processing MQ data are also provided by IBM and other vendors. IBM sites that contain resources for formatting and interpreting MQ Statistics and Accounting SMF records include the MQSMFCSV Github project and IBM MQ SupportPac MP1B. 

Interpreting and Deriving Value from MQ Statistics Data

This article will explore MQ Statistics data from SMF type 115 records using examples from these MQ components: Message Manager, Buffer Manager, Log Manager, and Storage Manager. Other MQ components that also produce SMF data and may be of interest for additional analysis include Data Manager, Coupling Facility Manager, Db2 Manager, Topic Manager, and Lock Manager.

MQ Message Manager

One key set of metrics reported by the Message Manager component is the number of API requests that have been made for each type of MQ command (GET, PUT, etc.). (Note that this may not be the same as the number of successful requests.) This can be a good starting point for identifying a workload baseline, as well as an indicator of any significant workload changes. The sample view of MQ PUT requests over time for a set of queue managers shown in Figure 1, as presented by IntelliMagic Vision, establishes such a baseline.

Figure 1 - MQPUT Requests (IntelliMagic Vision)

By combining MQ GET, PUT, and PUT1 activity into a single chart, as shown in Figure 2 on page 6, you can easily see if there is an excess of GETs over PUTs and PUT1s (the latter of which in this example are nearly zero). Such an excess may reflect potential GET inefficiencies, such as multiple application instances “competing” to get the messages, or the need to index one or more queues.

Figure 2 - MQ GET, PUT, PUT1 Requests (IntelliMagic Vision)

In addition to message counts at the queue manager level, readers may be curious about other metrics like CPU used, elapsed time (total and by MQ component), and differentiation between MQ work arriving from various types of callers (e.g., CICS, IMS, batch). If so, you can look forward to the article on MQ Accounting data in an upcoming Tuning Letter, where these topics and more will be covered.

MQ Buffer Manager

As is the case for Db2, responsive performance from MQ relies on data residing in memory, so the MQ queue manager uses buffer pools to minimize I/O activity. Thus, buffer pool management is an important aspect of managing MQ performance. 

MQ has a Deferred Write Process (DWP) that is activated when a buffer pool reaches 85% full. When that condition is reached, DWP will start to write the oldest data from the buffer pool out to disk, freeing up buffer pool pages for application activity. This asynchronous activity will continue until the buffer pool usage drops below 75% full. This is particularly undesirable for short-lived messages.

If the rate of data being put onto the buffer pool exceeds the rate that DWP can write to disk and the buffer pool usage reaches 95% full, then synchronous writes occur, which can increase the elapsed time that an application takes to PUT a message. And in any case, messages that have remained in a buffer pool for three checkpoints will be written to disk, no matter what the utilization of the pool may be.

These processes suggest that Buffer Manager metrics of interest will include percent buffer pool utilization, frequency with which the Deferred Write and Synchronous Write thresholds occur, and occurrences when a queue manager reads data from disk (indicating that it was previously destaged from a buffer pool).

As an example, Figure 3 shows the IntelliMagic Vision Health Insights screen that assesses the Buffer Manager metrics against best practices thresholds, initially by Queue Manager.

Figure 3 - MQ Buffer Manager Health Insights by Queue Manager (IntelliMagic Vision)

And Figure 4 on page 8 shows how you could then drill down by buffer pool within that Queue Manager.

Figure 4 - Health Insights - Drilldown by Buffer Pool for MQTC (IntelliMagic Vision)

Views of buffer pool utilizations over time can indicate when these values are approaching the previously mentioned thresholds. In the example in Figure 5, buffer pool 4 is periodically “flirting” with the 85% threshold that would trigger asynchronous destaging, and may warrant the definition of additional buffers.

Figure 5 - Buffer Usage (IntelliMagic Vision)

MQ Storage Manager

The Storage Manager component manages virtual storage within the MQ address spaces. The enhancement in MQ V8 that moved buffer pools from 31-bit storage to 64-bit storage (above the 2GB “bar”) dramatically increased the amount of storage that can be allocated to buffer pools, and improved MQ reliability by reducing the frequency of out-of-storage conditions.

So within this improved virtual storage framework, a primary value of Storage Manager metrics is that they can provide early warnings when any undesirable “exception” conditions are beginning to occur (see Figure 6). One such metric is “Contractions” (below and above the bar), which indicates that, in response to a moderate “short on storage” condition, MQ is attempting to remove unused storage from internal subpools so that it can be reused in other subpools. Another warning metric is “Short on Storage” (again below and above the bar), reflecting that a critical storage shortage has been encountered.

Figure 6 - MQ Storage Manager Health Insights (IntelliMagic Vision)

MQ Log Manager

Another point of similarity with Db2 is that a well-performing MQ logging infrastructure carries out its essential role in supporting recovery and backout without impacting ongoing performance. Logging activity is driven primarily by PUTs and GETs for persistent messages (which actually make up a minority of total messages in many environments). Log Manager metrics can identify any bottlenecks that may be occurring in log processing. One metric that may be useful from a profiling perspective is the volume of data being logged, as shown in Figure 7.

Figure 7 - Log Megabytes Written (IntelliMagic Vision)

Several Log Manager metrics are also excellent candidates for exception analysis, including log writes that wait due to unavailable log buffers, the number of times a log write buffer had to be paged-in before it could be used, reads from active or archive log datasets (likely reflecting frequent application backouts), and checkpoints being issued too frequently due to reaching the LOGLOAD value (see Figure 8). 

Another reason for frequent checkpoints may be that the log files are too small. Queue managers that have been in production for many years often use the old recommended allocation, which was really designed for a test system. As of MQ V9, the checkpoint count includes those from log switching as well as hitting LOGLOAD, so sites that are seeing a dramatic increase in the number of checkpoints may want to check their log file sizes.

Figure 8 - Log Manager Health Insights by Queue Manager (IntelliMagic Vision)

References

You can find more information about MQ and interpreting MQ SMF data in these reference documents and presentations:

  • The IBM MQ Product Overview manual provides a general overview of MQ, as well as a section that is specifically about MQ on z/OS. This provides a good introduction for those that are not familiar with MQ.
  • SHARE in Fort Worth 2020, Session 27066 “MQ SMF Data: What We Continue to Learn about Statistics and Lies [z/OS]”, by Lyn Elkins.
  • IBM MQSMFCSV Github project - Resources to assist with analysis of MQ SMF records.
  • IBM SupportPac MP1B - Utilities for interpreting MQ accounting and statistics data.
  • IBM SupportPac MP16 - “Capacity Planning and Tuning for IBM MQ for z/OS”.
  • Neil Johnston, “The Dark Side of Monitoring MQ SMF 115 and 116 Record Reading and Interpretation”, SHARE in San Francisco 2013, Session 12610.
  • Mayur Raja, “WebSphere MQ for z/OS Performance and Accounting”, WebSphere Integration User Group (UK), 2014.

Summary

Though we just scratched the surface of the available MQ Statistics metrics in this introductory article, I hope it has helped illustrate the vast potential of learning provided by mainframe SMF data, and piqued your interest in exploring MQ Statistics data more deeply.

And this is just one example; you may also want to consider how deriving insights from any type of SMF data can be greatly aided by having an intuitive interface to easily explore the metrics and view relationships between them. 

In future articles in this series, we will explore MQ Accounting data as well as SMF data from other components across the z/OS infrastructure. If there are particular z/OS areas you would like us to prioritize in future articles, please send us an email at technical@watsonwalker.com and let us know.

0 comments
12 views

Permalink