MQ

 View Only

MQ and SMF - What, when, how much?

By Anthony Sharkey posted Thu November 25, 2021 05:00 AM

  

The first blog “MQ and SMF – Why, which and how?” in this short series, explained how to enable the collection of MQ statistics and accounting data, also referred to SMF 115 and 116 records respectively, as well as how to configure the data collection frequency using the STATIME attribute.

 

The blog also discussed the available destinations for the SMF records, namely datasets, logstreams (including compressed logstreams) and in-memory logstreams.

 

For the purposes of measurements in this blog, we are using logstreams without compression. Performance report MP16 discusses the performance of logstreams over SMF datasets in the section titled “IBM MQ and zEDC with SMF”, and demonstrates that compressed logstreams can support a significantly higher rate of SMF recording than other SMF destination types.

 

 

IBM Documentation “Introduction to monitoring IBM MQ for z/OS” provides an overview of the monitoring facilities available for MQ for z/OS. Of those facilities, this blog looks into the collection of SMF types 115 and 116 with regards to:

  1. What options are required to gather particular data?
  2. When is the data collected, and potentially available for post-processing?
  3. How much data is collected, and what influences the amount of data being collected?
  4. How much will it cost, and who will the cost be attributed to?

 

With regards to SMF data and MQ, there are 2 record types, 115 and 116, which as mentioned earlier correspond to Statistics and Accounting type records.

 

Statistics records are largely at a queue manager level, for example providing information on buffer pool usage. The “Interpreting IBM MQ for z/OS performance statistics” section offers a list of the available components that are collected under SMF 115 records.

 

Accounting records are related more to the messaging workload that is being supported by the queue manager, with options for a high-level summary as well as per task data. This information is detailed in “Interpreting IBM MQ for z/OS accounting data”.

 
The following diagram attempts to show which MQ trace type and classes result in which SMF record and sub-type:

What options are required to gather particular data?

As discussed in the previous blog of this series, SMF data collection can be enabled using the the MQ START TRACE command.

 

STATISTICS

For statistics trace, we start trace using:

/<cpf> START TRACE(STAT) CLASS(<class>[,<class>])

 

Where <class> may be one or more of the following:

Class

Trace type

Subtypes written

01

Subsystem statistics

Queue manager statistics

1 - System information

2 - Messages, paging, QSG

201 - Page set

215 - Buffer pool

02[1]

Queue manager storage summary statistics

 

7 - Storage manager summary

031

Queue manager storage detail statistics

5, 6  - Detailed information about internal storage usage

04

Channel statistics data

231 - System information for channel initiator

 

ACCOUNTING

For accounting trace, we start the trace using:

/<cpf> START TRACE(ACCTG) CLASS(<class>[,<class>])

 

Where <class> may be one or more of the following:

Class

Trace type

Subtypes written

01

The processor time spent processing MQI calls and a count of MQPUT, MQPUT1 and MQGET calls

0 – Message manager accounting records.

 03[2]

Enhanced accounting and statistics data

1 - Accounting data for each task, at thread and queue level

2 – Additional queue level accounting data – if the task used more queues than could fit into subtype 1.

04

Channel accounting data

10 – Accounting data for channels

 

[1] Class 1 statistics must also be enabled to collect this class of data.

[2] The use of ACCTQ(OFF) either at the queue manager or queue level will prevent queue accounting data being reported. Refer to descriptions of the ACCTQ attribute at both queue manager and queue levels for further clarification.

When is the SMF data collected?

MQ statistics and accounting data is typically collected at the SMF interval as specified by the STATIME attribute.

 

STATIME, as a value in minutes, can be specified in the range 0 to 1440 minutes, with the default being set to 30 minutes. A value of zero will mean the data is collected at SMF data collection broadcast.

 

Note: There may be some variability from the SMF interval – on our performance systems we see up to 5% variability when using 1 minute intervals.

 

For all classes of statistics trace, the SMF records will be written at the specified interval.

 

Accounting trace is subtly different:

 

CLASS(01): Writes message manager accounting records on the SMF interval, but will only include the information for tasks ending in that interval, and as such may contain information spanning and summarizing many hours of activity.

If no tasks complete in an SMF interval, then no accounting records with subtype 0 will be written.

 

CLASS(03): For tasks that span SMF intervals, information about the task will be written on the SMF interval for the MQ activity that has occurred during that interval. Should the task complete before an SMF interval ends, any unwritten MQ activity is written as the task ends. As such, you may see class(3) records being written outside of interval boundaries.

 

In the event tjat a task does not complete any MQ API calls in the interval, then there will be no corresponding subtype 1 record written to SMF. For example, if the interval is set to one minute and your application is in a get-wait that does not get resolved for five minutes, then multiple intervals may pass between SMF records being written for the task associated with the application.

 

CLASS(04): Information is written on the SMF interval.

 

Note: Despite CICS applications not needing to issue MQCONN or MQDISC calls, both Accounting class 01 and 03 records are written at task end, and not at the time that the CICS region disconnects from the MQ queue manager.

 

How much data is collected, and what influences the amount of data being collected?

Historically, SMF records are limited to 32KB, so no individual record is particularly large. In many cases, MQ’s SMF records are considerably smaller than 32KB.

 

The volume and rate at which data is written to SMF can be affected by the SMF destination being used.

 

For example in MP16, the section titled “IBM MQ and zEnterprise Data Compression (zEDC) with SMF” shows that when Accounting class(3) trace is enabled, SMF data may be lost, (not written to SMF datasets) when running high transaction volumes. Simply by switching to use logstreams meant that the system was able to handle twice the volume of SMF data being logged per second, and a further 3.8 times the volume of work could be recorded to SMF logstreams where compression was applied.

 

Statistics trace typically collects much less data than accounting trace, and the following tables offer a rough guide to how much data might be written.

STATISTICS

Class

Description

Subtypes

KB / Interval

Notes

1

Subsystem statistics

1

2

201

215

 

Total

1

5.1

1.7

1.7

 

9.5

 

2

QM storage summary

 

7

 

+ Class(1)

 

Total

0.25

 

9.5

 

9.75

 

 

Requires class(1)

 

 

3

QM storage detail  summary

5

6

7

 

+ Class(1)

 

Total

7.5

1.9

0.25

 

9.5

 

19.25

 

 

Superset of class(2)

 

Requires class(1)

 

 

4

Channel initiator tasks statistics

231

4.4

 

 

The numbers of MQ resources such as buffer pools, page sets and channel initiator tasks will affect the volume of data written in each SMF interval. For example, the “KB per interval” reported in the table is based on a queue manager configured with:

  • 16 buffer pools
  • 16 page sets
  • 121 channel initiator tasks – including adaptor, dispatcher, name server and SSL tasks.

 

If running with STATISTICS CLASS(1,3,4) for the above queue manager configuration, we would expect to write 23.65 KB per SMF interval, regardless of the duration of the SMF interval. If that amount of data being written per SMF interval was extrapolated over a full day for a range of SMF intervals, we can predict the total volume of SMF 115 data to be:

 

SMF Interval

MB / day

1 minute

33

15 minutes

2.2

30 minutes

1.11

1 hour

0.55

 

ACCOUNTING 

Class

Description

Subtype

KB / Record

Notes

1

Processor time in MQI calls plus count of MQPUT, MQPUT1 and MQGET

0

0.4

Written at end of task.

 

 

 

3

Enhanced Accounting + Stats data

1

8.3

1 record (8.3KB) per task, per interval.

 

 

3

Enhanced Accounting + Stats data

1

 

ACCTQ(OFF)

2.7

Setting ACCTQ(OFF) results in much less data being written, but this results in much less valuable data being written.

4

Channel accounting

10

32

 

 

Variable length.

 

32KB contains 85 channels per record.

 

Each channel uses approximately 380 bytes.


Note:
Data quantities for class(3) accounting trace have been included when the ACCTQ is set to OFF, either at the queue manager level or for all queues. Whilst less data is collected, there is a corresponding drop in the value of the data. The additional data collected at part of ACCTQ(ON) is discussed later.


Example:

Consider a queue manager supporting 8000 clients connected over SVRCONN channels, each putting and getting 1 message per second. These clients are served by 10 long running batch tasks, that get the request message from a common queue and put a reply to a second common queue.

 

If the queue manager has accounting classes 3 and 4 enabled we would expect data volumes per SMF interval to be: 67.94 MB, based on 65MB[1] of Class(3) and 2.94MB of Class(4) accounting data.

 

When running with ACCOUNTING trace for the workload described above, we would expect to see SMF data being logged at the following rate of gigabytes per day, unless otherwise stated:

 

Class(3)

Class(3) with ACCTQ(OFF)

Class(4)

SMF Interval

GB / day

GB / day

MB / day

1 minute

95.5

29.7

4.1

15 minutes

6.4

1.98

0.27

30 minutes

3.2

0.99

0.14

1 hour

1.6

0.49

0.07

For this particular workload, 91GB of class(3) accounting data appears a lot – and it is, but this is the equivalent of 11,534,400 records over a full 24 hour period, and that quantity of data simply can’t be processed without some additional tooling. To process such quantities of data is discussed in the third blog in the series.

 

[1] Class(3) data volume based on 8010 tasks each using 8.3KB. Class(4) data is based on 8000 channel accounting records each using 380 bytes.

 

Impact of ACCTQ(OFF) on class(3) accounting data

Whilst setting ACCTQ(OFF) either at the queue manager or queue level can significantly reduce the amount of data being written as part of class(3) accounting, it is worth considering the sort of data that would not then be available and whether this can mean that the data collected is useful or not.

 

Data in red is available when using ACCTQ(ON) for the queue manager and ACCTQ(QMGR) at the queue level:

           ====> New task record found     <==========                                     
== Thread type............> BATCH                                                          
== Connection name........> SVR001                                                          
== Operator ID............> PERFTASK                                                       
== User ID................> PERFTASK                                                       
== Channel name...........>                                                                 
== Chl connection.........>                                                                
== Correlator ID..........>                                                                
== Correlator ID.....(HEX)> 404040404040.                                      
== Context token..........>                                                                
== Context token.....(HEX)> 0000000000000000                              
== NID....................>                                                                
== NID...............(HEX)> 4040404040404040                              
== Accounting token.......>                                                                 
== Accounting token..(HEX)> 0000000000000000000000                  
== UOW identifier.........>
== UOW identifier....(HEX)> 404040404040404040404040.              
== Task token : 05-10-2021 00:09:05.77,  7C4976A0,  7C3F9070                               
== Interval   : START 05-10-2021 00:11:22.30                                               
== Interval   : END   05-10-2021 00:12:00.61                                               
== Number of queue blocks for this task          1   
== Other reqs : Count    1021960,  Avg elapsed         27,  Avg CPU         15                                      
== Latch      : Max number         19,  Max wait    5339148 mics                           
 > Latch 11,  Total wait         55 mics, Waits         37, Name SSSCONN |DMCSEGAL         
              Avg wait            1 mics, Max           43                                 
 > Latch 12,  Total wait     106258 mics, Waits       1446, Name DMCNMSPC|XMCHASH          
              Avg wait           73 mics, Max         1679                                 
 > Latch 15,  Total wait         83 mics, Waits         33, Name BMXL1   |CFMXL1           
              Avg wait            2 mics, Max           29                                 
 > Latch 16,  Total wait       6925 mics, Waits        924, Name BMXL2   |RMCRMST |RLMARQC 
              Avg wait            7 mics, Max          154                                 
 > Latch 19,  Total wait    5339148 mics, Waits     147180, Name BMXL3   |CFMTODO |SRH1_L19
              Avg wait           36 mics, Max         1178                                 
 > Latch 24,  Total wait     129913 mics, Waits      29202, Name LMXL1                     
              Avg wait            4 mics, Max          210                                 
 > Latch 30,  Total wait        161 mics, Waits          6, Name IFCTRACE|ASMSAGT |DDFDTM  
              Avg wait           26 mics, Max           59                                 
 > Latch 31,  Total wait        196 mics, Waits         66, Name SMCPVT  |DPSLTCH |BMXLH   
              Avg wait            2 mics, Max           34                                  
 > Latch 32,  Total wait        244 mics, Waits        290, Name SMCPHB                    
              Avg wait            0 mics, Max           70                                 
 > Address of latch for longest wait:         00000000                                     
== Commit     : Count     170947,  Avg elapsed         33,  Avg CPU          3             
== Suspend    : Count     170941,  Avg elapsed         31                                  
== Pages      : New      3080331,  Old    4332618                                          
== Buff steal : LRU requests    7412949                                                    
== Buff steal : LRU wait       5238 us, LRU held          0 us                              
== Task token : 05-10-2021 00:09:05.77,  7C4976A0  
Open name LIXM01                                           Object type: Local Queue                                
Base name LIXM01                                           Base type : Queue                                     
Queue indexed by MSG_ID                                                                                          
First opened 05-10-2021 00:09:05.77                                                                               
Last closed  05-10-2021 00:12:00.60                                                                               
Page set ID             1,  Buffer pool          1                                                                
Current opens           0,  Total requests    1016170                                                             
Generated messages :          0                                                                                   
Persistent messages: GETs          0,  PUTs          0,  PUT1s          0                                         
Put to waiting getter: PUT          0,  PUT1          0                                                           
Put direct to shared queue:          0                                                                            
PUTs: Valid    170945,  Max size     65536,  Min size     65536,  Total bytes  10684 MiB                          
-MQ call-          N         ET         CT       Susp       LOGW       PSET Epages   skip expire                  
Close  :          1          1          1          0                                                              
Put    :     170945         35         22         13          0                                                  
Maximum depth encountered          5                                                                             

                                                                     

 

Note:

Since z/OS v2r4, the SMF dump programs IFASMFDP and IFASMFDL support the option REPORTOPTS(NOSUBTYPE|SUBTYPE)

 

Specifying REPORTOPTS(SUBTYPE) produces a summary activity report of the count and size of the individual sub-types for the selected SMF records, for example:

 

                                           SUMMARY ACTIVITY REPORT                                  

    RECORD         RECORDS         PERCENT      AVG. RECORD   MIN. RECORD   MAX. RECORD       RECORDS
TYPE / SUBTYPE        READ        OF TOTAL           LENGTH        LENGTH        LENGTH       WRITTEN
   2 /     -             1           .02 %            18.00            18            18             1
   3 /     -             1           .02 %            18.00            18            18             1
 115 /     1         1,658         24.98 %         1,012.00         1,012         1,012         1,658
 115 /     2         1,658         24.98 %         5,244.00         5,244         5,244         1,658
 115 /   201         1,659         25.00 %         1,760.00         1,760         1,760         1,659
 115 /   215         1,659         25.00 %         1,760.00         1,760         1,760         1,659
      TOTAL          6,636           100 %         2,443.06            18         5,244         6,636

 

How much will data collection cost and which address spaces are charged? 

Statistics data is relatively straightforward and the costs are insignificant, consisting of less than a 1% overhead on the MQ queue manager and all costs being charged to the queue manager address space.

 

Note:

To give a precise percentage overhead for statistics data collection is relatively difficult.

The majority of MQ costs are generally attributed to the applications connected to the queue manager.

CPU charged to the MQ queue manager tends to occur from actions like the logging of persistent messages, writing to page set or message expiry, and as such a queue manager used for persistent messaging would typically use more CPU than a queue manager used solely for non-persistent messaging. In a case where the queue manager is using larger amounts of CPU, the overhead of statistics trace is significantly lower than the 1% mentioned above..

 

Accounting data costs are discussed in detail in MP16 section “Accounting Trace Costs”, but an updated summary is provided below.

 

Typically accounting costs are primarily attributed to the application address space, but there will also be a small cost in the queue manager address space which is due to management of the storage required for the data collection and invocation of the SMF writer routines.

The following costs have been gathered on IBM z15 mainframe computer.

 

Class(1) accounting data:

  • Data gathering:
    • 0.1-1 microseconds per API call (MQPUT, MQPUT1, MQGET only)
  • Writing to SMF
    • 1-3 microseconds. Only charged when the task ends.

 

Class(3) accounting data:

  • Data gathering:
    • 0.5-1 microseconds per API call (including commit, backout and other)
  • Writing to SMF
    • 4-36 microseconds for the primary SMF record, depending on size. Costs are at the lower end of the range when few queues are accessed.
    • Plus, up to 60% of the cost of writing the primary SMF record for each continuation record, dependent on how many queues the continuation record contains.

  

Why don't we state a percentage overhead for accounting trace? 

The cost of collecting data for accounting trace is relatively fixed, regardless of other factors.

Consider the impact to an MQPUT1 call of enabling class(3) accounting:

Message Size

Cost of MQPUT1

(uSeconds)

Class(3) data gathering

Write to SMF

Total Increase

% Increase

1KB

10

+1

+4

+5

50%

64KB

36

+1

+4

+5

14%

4MB

1000

+1

+4

+5

0.5%

 

For a transaction that just uses an MQPUT1 call to put a 1KB message, the impact of enabling class(3) accounting is 50% to the MQ cost, but for a larger message (4MB) the impact is less than 1%.

Additionally, the transaction issuing the MQPUT1 call will perform other processing, even if just to generate the data to be put to the queue.

The overhead of accounting trace does not apply to that “other processing”, so consider the case where the “other processing” equates to 1 CPU millisecond:

Message Size

Cost of MQPUT1

 

(uSeconds)

Other

Total

Without Accounting

Total

With Accounting

% Overhead

1KB

10

+1000

1010

1015

0.5%

64KB

36

+1000

1036

1041

0.48%

4MB

1000

+1000

2000

2005

0.25%

 

This unfortunately means that when it is asked how much overhead does accounting trace add, the answer is typically “it depends”.

 

The cost of accounting trace no longer sounds intimidating, but…

Queue manager storage usage

In the case of high-volume short-lived transactions, such as CICS transactions, the management of the storage required for the tasks associated with the transactions can result in storage usage increasing and associated cost increase for locating the appropriate storage for the specific task.

This increase in storage usage tends to happen with either very high transaction rates or with many concurrently in-flight transactions, at least until a working set of storage is allocated for the current workload level.


SMF data volumes

We ran a set of simple CICS transactions for 2 minutes and enabled accounting trace.

 

Each CICS transaction made 6 explicit MQ API requests – 2 MQOPENs, 2 MQCLOSEs, 1 MQPUT and 1 MQGET.

Additionally, the transaction issued an EXEC CICS SYNCPOINT after putting a request message and again after getting the corresponding reply message – these are translated to implicit MQCMIT calls.

Finally, on transaction end, there are implicit “other” and “backout” calls to MQ, to finalise the transaction and back out any uncommitted requests (if required) – both of these are minimal cost.

 

This equates to 6 explicit MQ API calls and 4 implicit calls, to give a total of 10 API calls.

 

The following table demonstrates the increased cost (in microseconds per transaction) of enabling accounting trace when compared to a measurement without any accounting trace.

 

ACCOUNTING CLASS(1)

ACCOUNTING CLASS(3)

CICS region (TCB)

 

Data collection cost

+6.5

+9

MQ queue manager (TCB)

 

Incorporating cost of write to SMF

None measurable.

+5

 

Note: Costs shown are the increase in transaction cost compared to the equivalent measurement without any accounting trace enabled.

 

These costs are in-line with the values in the “How much will data collection cost and which address spaces are charged?” section.

 

The difference with the CICS transactions is that for a measurement spanning just 2 minutes and running with 12,000 transactions per second is the volume of SMF data collected.

 In just 2 minutes, the CICS region processed 1.44 million transactions, each resulting in an SMF 116 record being written.

 

For accounting class(1), this equates to 598 MB for the 1.44 million SMF 116 subtype 0 records, or 420GB per day if the 12,000 transactions per second was sustained for a 24 hour period.

For accounting class(3), this equates to 11,671 MB for 1.44 million SMF 116 subtype 1 records, or 8TB per day when achieving 12,000 transactions per second for the full 24 hour period..

 
Note: Since these are short-lived transactions, the SMF interval is irrelevant - typically the transactions are running for less than 1 second, so each transaction results in 1 SMF record being written at end of task.

As such, with high volume, short-lived transaction environments, it is suggested that the class 1 and 3 accounting traces are only enabled less frequently and for shorter periods of time.

In summary

Hopefully this blog has helped to understand what the impact of enabling MQ statistics and accounting trace will have on your workloads. Monitoring an MQ subsystem will always have an inherent cost but it does not need to be significant and consideration should be given to the cost and benefit of enabling the statistics and accounting trace as opposed to running without monitoring.

The more expensive accounting trace does not always need to be enabled to give a good indication of how your MQ infrastructure is performing, but even "snap shots" taken over the course of the day / week / month of your workloads can help build a model of the queue manager usage patterns.

Certainly we would encourage MQ users to be enabling statistics trace at all times due to the low impact those traces have on queue manager performance.

What next?

In the next blog in this series, we will look at options for processing larger volumes of MQ SMF data.

                    

Permalink