MQ for z/OS 9.3.0 introduced the streaming queue feature on local queues, which allow the MQ queue manager to take a copy of messages put to local queues and deliver the copy to a separate queue or topic of your choice.
Recently you may have noticed my colleagues post that support for shared streaming queues is now available on MQ for z/OS is available as part of the MQ 9.3.1.
APAR PH49686 provides support for shared streaming queues in MQ for z/OS 9.3.0.
As a result of shared streaming queues being available via APAR PH49686, the MQ for z/OS 9.3 performance report has now been updated to include the performance impact of using shared queues as part of the streaming queue feature. Since the original report was posted, our performance systems have moved from IBM z15 to IBM z16 and as a result, all of the streaming queue performance data has been refreshed.
The remainder of this blog will give a brief overview of the performance and considerations to take into account when using shared queues with the streaming queue feature.
Streaming queues on shared queues
With streaming queues configured, when a putting application puts a message to the original queue, a near-identical duplicate message is delivered to the stream queue. They are designed to provide a convenient way of capturing a duplicate stream of message in order to:
- Stream messages to Apache Kafka using the Kafka Connect source connector for IBM MQ.
- Perform analysis on the data going through the system.
- Store messages for recovery as a later time.
- Capture a set of messages to use in development and test systems.
- Consume IBM MQ event messages from the system event queues, sending additional copies to other queues and topics.
There are a number of shared queue specific items that are worth highlighting including:
- First-open and last-close effects
- Put out-of-syncpoint and STRMQOS(MUSTDUP)
- Where the stream queue is on a different structure to the original queue.
These items will be discussed further down this blog as well as in the updated MQ for z/OS 9.3 performance report.
Diagram: Streaming queue overview when using shared queues
The diagram shows both the base and the streaming queue configured as shared queues. It is equally valid to have one or both as private queues, but for the purposes of the performance report and this blog we are using the same type of queue for both the base queue and stream queue.
Basic Performance - MQOPEN and MQCLOSE
On our system, a typical MQOPEN and MQCLOSE where the queue is a shared queue, costs of the order 18 CPU microseconds.
With the STREAMQ attribute configured, also to use a shared queue, the cost rose to 32 CPU microseconds. This is regardless of whether the streaming queue is on the same of different application structure in the same geographically located and configured Coupling Facility.
First-open and last-close effects, as discussed in performance report MP16's section "Frequent opening of shared queues" can make a significant difference to the cost of MQOPEN and MQCLOSE when accessing shared queues.
Basic Performance - MQPUT and MQPUT1
The basic performance impact of enabling streaming queues on an existing shared queue are shown in the following table.
In all cases, the measurements used multiple non-persistent 1KB messages. Costs are reported in CPU microseconds and are based upon MQ accounting trace class 3 data.
Generally the impact of using shared queues with streaming is the doubling of the MQPUT cost, as per class 3 accounting data.
|
STRMQOS |
Baseline |
StreamQ |
MQPUT out-of-syncpoint |
BESTEF |
7 |
15 |
|
MUSTDUP |
7 |
22 |
MQPUT in-syncpoint |
Either |
7 |
15 |
MQPUT in-syncpoint StreamQ on separate structure |
Either |
7 |
14 |
MQPUT1 in-syncpoint |
Either |
10 |
32 |
Notes on table:
- MQPUT out-of-syncpoint with STRMQOS(MUSTDUP) is discussed later, but is essentially that the MUSTDUP option is enforcing a unit of work to ensure that either both or neither MQPUT is successful, and thereby negating any benefit of the out-of-syncpoint put.
- MQPUT1 using shared streaming queues will require a further APAR to address the excess impact of shared queues, relating to the internal open and close of the queues.
Basic Performance - MQPUT with increasing message size
Whilst message size does affect the cost of the MQPUT, the impact of increasing message size when streaming to shared queues is generally to double the cost of the MQPUT.
The following chart shows the cost of a batch application putting messages of increasing sizes to both a shared queue and to a shared queue where a streaming queue is configured. The values shown are CPU microseconds per MQPUT, based on class 3 accounting data where thousands of messages are put to the queue.
Chart: Cost of MQPUT when STREAMQ configured using shared queues
First-open and last-close effects
First-open and last-close effects, as discussed in performance report MP16's section "Frequent opening of shared queues" can make a significant difference to the cost of MQOPEN and MQCLOSE when accessing shared queues.
Avoiding CF access for the open and close of a queue configured without a streaming queue reduces the cost from 18 to 5.5 CPU microseconds.
Avoiding CF access for the open and close of a queue configured with a streaming queue reduces the cost from 32 CPU microseconds, but the scale of the impact will depend on how that CF access is avoided.
Configuration used to minimise CF access on MQOPEN and MQCLOSE |
Effect on streaming queue |
Cost (CPU microseconds) |
First-open and last close effect |
CF access every time |
32 |
Hold base queue open for input (wrong type) (avoiding first-open/last-close on base queue) |
CF access every time |
20 |
Hold base queue open before stream queue is configured (avoiding first-open/last-close on base queue) |
CF access every time |
20 |
Hold base queue open, with stream queue in separate structure (avoiding first-open/last-close on base queue) |
CF access every time |
20 |
Hold only stream queue open (avoiding first-open/last-close on stream queue) |
No CF access |
20 |
Hold base queue open, with stream queue in same structure (avoids first-open/last-close on both queues) |
No CF access |
7.5 |
Hold both base queue and stream queue (regardless of whether same or different structure) (avoids first-open/last-close on both queues) |
No CF access |
7.5 |
Put out-of-syncpoint and STRMQOS(MUSTDUP)
The increased cost observed when performing an MQPUT out-of-syncpoint when STRMQOS(MUSTDUP) is configured can be explained when we review the MQ task records (WTAS and WQ) which offer additional CF statistics. These statistics are described in the blog "MQ for z/OS - CF Statistics".
The table compares which CF statistics are reported for 3 configurations and demonstrates how that impacts the cost on both z/OS and the Coupling Facility.
Costs shown are CPU microseconds.
MQPUT out-of-syncpoint to shared queue |
CF Statistics |
z/OS cost per MQPUT |
CF cost per MQPUT |
Baseline - put to shared queue |
1 x "New" |
9 |
4 |
Baseline + STREAMQ with STRMQOS(BESTEF) |
2 x "New" |
15 |
8 |
Baseline + STREAMQ with STRMQOS(MUSTDUP) |
3 x "New" 1 x "Write" 1 x "MoveEnt" |
22 |
20 |
Effectively STRMQOS(MUSTDUP) with an MQPUT out-of-syncpoint is enforcing a unit of work and performing commit processing.
Stream queue on different application structure to original queue
When STREAMQ uses a queue on a different structure to the original queue, there is additional cost in both the z/OS LPAR and the Coupling Facility, and this is due to the impact of additional CF work at the time of the commit.
The following table again uses the CF statistics discussed in blog "MQ for z/OS - CF Statistics" to illustrate the differences.
|
Queues on same structure |
Queues on different structure |
MQPUT to base and streaming queue |
2 x "New" |
2 x "New" |
MQCMIT |
1 x "New" 1 x "Write" 1 x "MoveEnt" |
2 x "New" 1 x "Write" 2 x "MoveEnt" |
z/OS cost per MQPUT (CPU microseconds) |
14 |
15 |
z/OS cost per MQCMIT |
8 |
18 |
CF cost per MQPUT |
8 |
8 |
CF cost per MQCMIT |
11 |
29 |
When the queues are on different application structures, the MQ commit has to make 1 additional "New" and 1 additional "MoveEnt" request to the CF, which in our measurements took 18 CPU microseconds in the Coupling Facility. Less responsive or CF's located at distance from the z/OS LPAR may see a larger disparity in performance when using separate structures.
A reminder...
And finally, a reminder that the mqperf github repository contains performance reports and white papers on a range of MQ performance topics for both z/OS and distributed platforms.