MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

MQ for z/OS: Message selector performance

By Anthony Sharkey posted Thu October 05, 2023 04:26 AM

MQ for z/OS: Message Selectors on shared queues

Recently I was involved in a discussion relating to the use of message selectors to access specific messages on shared queues.

Message selectors are discussed in the IBM Documentation section “Message selectors in JMS”.

The particular use case under discussion involved the optimizable message selector JMSCorrelationID but could equally have been the JMSMessageID selector.

What are we measuring?

For the purposes of this blog, we will discuss a limited set of scenarios:

Message selection from queue, where either:

The queue depth is low, such that the message on the queue matches the selection criteria.
Or the queue depth is neither trivial nor deep such that there are 1001 messages on the queue, where the first 1000 do not match the selection criteria.

We will compare the impact of the queue depth on 3 configurations:

Private queues.
Shared queue where the data is entirely in the Coupling Facility.
Shared queue where the data is stored in Shared Message Data Sets (SMDS) – but most of the data is held in SMDS buffers (DSBUFS). Message selection using the correlation ID can locate the message using the data held solely in the Coupling Facility (CF) but may require access to SMDS to retrieve the message payload. Message selection using non-optimized selectors will need to access the message payload on the SMDS to determine if the messages’ properties match the selection criteria.

For each of these 3 configurations we have measured the performance of an MQGET using

JMSCorrelationID
A generic message property
Application uses MQ GMO match options and explicitly specifies the CORRELID in the MQMD.

For the purposes of these measurements, the getting application is written in C so that we can use all 3 of these get types.

The measurements have been run on an IBM z16 z/OS 2.5 LPAR with an internal Coupling Facility.

To demonstrate how queue depth can significantly affect the performance, even when using the optimizable selectors, the measurements have been run on queues that are either INDXTYPE(CORRELID) or INDXTYPE(NONE).

Using message properties, including the optimizable selector, means that a message can be located using that property, even on a queue that is not indexed appropriately. For low depth queues, the performance is comparable to that of an indexed queue, but it does not take many messages on the queue, or indeed many messages with properties for the MQGET to be less responsive.

For those measurements on shared queues with using the CORRELID in the MQMD, where the queue is configured with INDXTYPE(NONE), the MQGET fails with an MQRC_CORREL_ID_ERROR.

Costs reported are as indicated by MQ’s class(3) accounting trace for the MQGET.

Terminology

In this blog, I use the term “optimized selectors”, so I should explain what I mean.

There are multiple ways to get specific messages from queues, for example:

1. Application uses MQ GMO match option explicitly setting the CORRELID in the MQMD.
2. Application uses a non-optimizable selector (“property=value”).
3. Application uses optimizable selector, (for example JMSCorrelationID), but the queue is not suitable, i.e. the selection is from a shared queue that has not been indexed appropriately i.e. INDXTYPE(CORRELID).
4. Application uses an optimizable selector and either is getting from a private queue or the shared queue has been indexed appropriately.

For the purposes of this blog, optimized selectors are those using methods 1 or 4 to get the message. From a performance perspective these are the preferred option capable of the best performance with increasing queue depths.

Similarly, we refer to methods 2 and 3 as non-optimized selectors as the performance can be significantly affected by depth of queue, activity on the queue and the contents of the messages on the queue.

An optimized selector will be of the form:

JMSCorrelationID='ID:414D51207061756C745639314C545320C57C1A5F25ECE602'

Private queues

Optimized selectors – JMSCorrelationID or CORRELID in MQMD

The behaviour of an application using JMSCorrelationID or specifying the CORRELID in the MQMD to identify a specific message is similar when using private queues.

When getting specific messages from appropriately indexed queues, i.e. the INDXTYPE matches the message selector, the depth of the queue is less of a factor affecting the cost of the MQGET.

For example, an MQGET using JMSCorrelationID from a queue indexed by CORRELID cost 16-20 CPU microseconds whether the queue depth was 1 or 1001.

For a queue that was non-indexed, the MQGET when the queue depth was 1 was comparable to the cost when getting from an indexed queue. However, when the depth was 1001, the MQGET cost up to 25 times that of an MQGET from an indexed queue. As the depth of the queue increases, the overhead of scanning a non-indexed queue will increase.

Non-optimized selectors

When using non-optimized selectors to identify a specific message, the INDXTYPE is not relevant but whether the messages all have properties is relevant.

For example, if the queue contains 1001 messages, of which 1 contains a property, and message selection uses that property, the MQGET costs 31 microseconds. If all the messages contain properties such that every message needs to be scanned for a matching property, then we saw the cost increase by a factor of over 100 times, to 3300 microseconds.

Shared queues where message is entirely in the Coupling Facility

As mentioned previously, specifying the CORRELID in the MQMD to identify a specific message on a queue indexed by anything other than CORRELID will result in an MQRC_CORREL_ID_ERROR being returned to the application.

CORRELID in MQMD

When the shared queue is indexed with CORRELID, using the CORRELID in the MQMD means that the cost of the MQGET is relatively consistent regardless of depth, provided the message is held entirely in the Coupling Facility.

In our measurements, the MQGET cost was of the order of 21 microseconds, with an additional 9.6 microseconds observed in the Coupling Facility.

Optimized Selector - JMSCorrelationID

It is possible to use one of the optimizable selectors on shared queues even when the INDXTYPE does not match the selector.

For low depth queues, the MQGET will be as responsive as an MQGET on an indexed queue, but for deeper queues the cost of finding the specific message will be relatively high in both the z/OS LPAR and the Coupling Facility.

The following chart compares the impact of indexing a shared queue with INDXTYPE(CORRELID) with depths of 1 and 1001 messages when using JMSCorrelationID to select a specific message.

MQ Get performance when using optimised selector

Note: Chart has log scale on y-axis.

For very low depths, the INDXTYPE makes little difference to the cost of the MQGET.

If the INDXTYPE is not CORRELID, there was an additional call to the Coupling Facility of CF request type STARTMON.

My earlier blog “MQ for z/OS – CF Statistics” discusses the CF request types including STARTMON.

For queues of only 1001 messages where the message of interest is the last message, the lack of an appropriate INDXTYPE resulted in the MQGET costing 173 times that of the same MQGET from an indexed queue.

Additionally, there was an increase in CF cost of 26 times.

An external CF or one that is duplexed may be less responsive and therefore the CF cost may be higher.

When using JMSCorrelationID to access a shared queue that is not indexed by CORRELID and the depth is exceeds 64 messages, you will see 2 or potentially more READLIST requests made.

In our measurements for a single MQGET where 1001 messages needed to be scanned, we observed that there were 3 READLIST requests. These accounted for more than 94% of the CF cost.

When the multiple READLIST calls occur, you will also see an increase in the count of re-drives to the CF.

Non-optimized selectors

The use of non-optimized selectors on shared queues is a relatively expensive process when there are many messages that need to be scanned for a matching property.

Whether all messages on the queue contain message properties can also affect the cost of a successful MQGET.

Consider the following scenarios:

Queue contains 1 message which matches the desired message property.
Queue contains 1001 messages, 1000 of which do not contain the desired message property.

1. Of the 1000 messages, none contain any message properties.
2. Of the 1000 messages, all contain a single message property.
3. Of the 1000 messages, all contain 10 message properties.

The following chart demonstrates the difference in cost on both the z/OS LPAR and the Coupling Facility:

MQ Get using non-optimised selector on shared queue

Note: Chart has log scale on y-axis.

There are optimizations in MQ’s processing such that where messages on the queue do not contain properties, the message selection path in the MQGET is able to skip those messages. As such, the cost of the MQGET when the depth is 1 and when the depth is 1001 are similar.

When there are message properties on all the messages on the queue, each message will need to be scanned – so each message must be retrieved, and each property must be parsed to determine whether it is a match.

Whilst the CF cost is similar whether the messages have 1 or multiple properties, there is additional cost in the z/OS LPAR that is incurred with many properties due to each message property having to be parsed.

Scanning 1001 messages with 1 property for a match costs 146 times that of being able to scan just 1 message. The CF cost for that scan is 16 times more expensive.

The CF cost increase is simply due to MQ needing to perform multiple READLIST requests to retrieve all the potential matches, i.e. 1001 messages require 3 READLIST requests.

The additional cost in the LPAR results from having to process each message and parse the message properties in the message to determine whether there is a match for the desired message property.

Shared queues where message is stored in Shared Message Data Sets

When using shared queues and either the message size or the number of messages mean that there is insufficient capacity in the Coupling Facility, CFLEVEL(5) allows messages to be offloaded.

Due to the way CFLEVEL(5) has been designed, there is space in the Coupling Facility for the message header and up to 122 bytes of message data to be stored. Message data over that 122 bytes will be written to the offload destination – SMDS or Db2.

For performance reasons, we would suggest using SMDS as the offload destination.

Optimized message selectors with appropriately indexed queues can be used to identify the desired message by accessing the Coupling Facility, but depending on the size of the message payload, it may also be necessary to read the SMDS to retrieve the full message.

Non-optimized message selectors, or optimized selectors on non-indexed queues may be impacted by needing to read multiple potential matches from the offload destination.

The performance characteristics of selecting messages are similar but somewhat less performant when the message needs to be accessed on the offload medium, so I won’t revisit those specific costs, other than to say indexed queues with optimized selectors are the preferable option for getting specific messages.

However, I do think it is worth drawing attention to the performance when using non-optimized selectors when the message has been offloaded.

Consider a queue with 1001 messages, each with 10 properties and the desired message is the last message on the queue for the 2 scenarios:

Messages are entirely stored in the Coupling Facility
Messages have been offloaded onto SMDS.

In this case we see similar costs for accessing the coupling facility – approximately 240 microseconds based on:

1 STARTMON costing 5 microseconds,
3 READLIST costing 230 microseconds,
1 MOVE costing 5 microseconds.

By contrast the cost of the MQGET increases from 7.7 milliseconds when the message is entirely stored in the CF, to 15.4 milliseconds when needing to access the message in SMDS.

This additional cost is due to MQ needing to read the SMDS for all the potential matches, which in this case equates to 1001 reads.

Whilst the measurements relying on SMDS for message storage are largely relying on the data being in cached storage, the performance will be significantly affected if the SMDS needs to be read. The tuning of Shared Message Data Set performance is discussed in “MP16: Capacity Planning and Tuning Guide for IBM MQ for z/OS” section “Shared Message Data Sets – CFLEVEL(5)”.

Low activity queues and message selection on shared queues.

Consider a scenario where a shared queue is being monitored for specific messages. The queue is not typically busy, perhaps 1 message arrives every minute.

Impact of zero to non-zero depth:

The Coupling Facility has a feature that can notify MQ when the depth of the queue increases from zero to one or higher and this event will post the get tasks to allow them to determine if the message of interest is on the queue.

The use of optimized selectors makes a difference to performance even when using shared queues with low message throughput.

Consider the differences when a set of applications are running in a MQGET-wait for 1 minute before a message is put to the queue.

Type	Optimizable / Non-optimized	Optimized
Queue INDXTYPE	NONE	CORRELID
Successful MQGETs	1	1
Get-specific attempted	124	2
CPU cost of successful MQGET	248	36
CF time per successful MQGET	620	18
No. READLIST calls per successful MQGET	124	4

Note:

‘CF time per successful MQGET’ is the time spent waiting for the request to flow from the LPAR to the CF, the CF to process that request and subsequently for the response to flow back. It is not the CF service time – that would be a subset of the reported time. However, in a configuration where the CF is external with slow CF links, the time may be higher.

'Get-specific attempted' is the number of times MQ has attempted an MQGET internally - it does not represent the MQGET returning to the application.

For a shared queue that can use the zero-to-non-zero transition, the cost of a single task that doesn’t use an optimized get whilst waiting for 1 minute for a message to arrive is 7 times more expensive than one that does make full use of the optimized get.

The difference in time waiting for the CF to respond is even more significant – in this case, the non-optimized MQGET was 34 times longer than the optimized get.

Impact of new message arriving when depth is non-zero:

When the queue is already has messages on the queue and a new message arrives, the Coupling Facility does not have a feature to allow notification of this change. Instead MQ uses an internal timer to poll the Coupling Facility to see if there are messages that may be of interest. This timer can run multiple times per second for each task in a MQGET-with-wait.

Consider the differences when the shared queue already has 200 messages on the queue that do not match the selection criteria. Once more, we have a set of applications running in an MQGET-wait for 1 message before the message of interest is put to the queue.

Type	Optimizable / Non-optimized	Optimized
Queue INDXTYPE	NONE	CORRELID
Successful MQGETs	1	1
Get-specific attempted	135	2
CPU cost of successful MQGET	13,095	38
CF time per successful MQGET	2,041,200	28
No. READLIST calls per successful MQGET	270	2

There is some additional cost in this case due to the queue depth. Once the queue exceeds approximately 64 messages, MQ must re-drive the READLIST with a larger buffer to get more potential matches. This and subsequent re-drives will retrieve approximately 512 messages until either a match is found, or all potential matches have been checked.

For a shared queue that cannot use the zero-to-non-zero transition and must rely on MQ polling the CF, the cost of a single task that doesn’t use an optimized get whilst waiting for 1 minute for message to arrive was 344 times more expensive than one that does make full use of the optimized get.

The difference in time waiting for the CF to respond is even more significant – in this case, the non-optimized MQGET was more than 70,000 times longer than the optimized get.

Multiple get tasks on shared queue with non-indexed queues.

As the previous section revealed the use of non-optimized selectors on shared queues can prove significantly more expensive than optimized selectors.

Consider the impact to the low usage queues if, due to business demands, the message should be processed as quickly as possible. To cater for the arrival of multiple messages at any time, multiple get tasks are started with long get-wait times to monitoring the queues.

If there are 100 get tasks waiting for a specific message, imagine multiplying the LPAR cost and the CF response time per successful MQGET for the previous section.

For the non-optimized use case with zero-to-one transition, 100 tasks waiting for 1 message over a 1-minute period could add 24.8 CPU millisecond to the CPU load and 62 CPU milliseconds waiting for the CF responses, with potentially 12,000 READLIST calls flowing over the CF links – all effectively achieving no benefit.

Contrast this to 100 tasks using the optimized MQGET where the cost would be 3.6 CPU milliseconds of CPU, 1.8 CPU milliseconds waiting for the CF responses, with 1800 READLIST calls, therefore reducing the load on the z/OS LPAR, Coupling Facility and CF links.

If the queue is being used for other messages, perhaps with different properties that the current set of get tasks are not interested in, such that the queue depth is never zero, the impact is far higher.

In this scenario with just 200 messages on the queue, the optimized MQGET tasks would cost 3.8 CPU milliseconds in the LPAR and 2.8 CPU milliseconds waiting for the CF responses, making just 200 READLIST requests over the 1-minute period.

By contrast, the non-optimized MQGET tasks would cost 1.3 CPU seconds on the LPAR and more significantly 200 CPU seconds waiting for CF responses every elapsed minute whilst making 27,000 READLIST requests.

Using client-based selectors

As the blog has already discussed, both the selection of messages based on non-optimized message properties or message selection based on optimized message properties but from queues indexed with a different value to the message property, can be expensive depending on how many messages must be scanned to identify the message of interest.

When client applications perform these message selections, connecting via SVRCONN channels to the channel initiator, the expensive MQGET will block one of the channel initiators adaptor tasks while the request is completed.

It is essential to ensure that there are sufficient CHIADAP (channel initiator adaptor) tasks available such that there is at least 1 unused at peak times, to avoid waiting for an adaptor. MQ’s class(4) statistics data can be used to monitor the usage of the channel initiator tasks including adaptors.

With messages building up on queues, would more getter-type applications help?

In many cases a queue increasing in depth can be managed by adding more applications to process those messages.

When a queue containing messages with properties increases in depth, the addition of more applications using inefficient message selection criteria can make the performance much worse.

Adding tasks using expensive selectors will add more load on the CPU for both the LPAR and if using shared queues, the CF too. When using client-based selectors there will also be additional load on adaptor tasks, potentially blocking channel initiator workload that is not using message selectors.

When accessing shared queues using inefficient message selectors, not only will the load on the processors (LPAR and CF) increase but there may also be an increase in the serialization in the CF, leading to existing tasks’ response times increasing.

How do I know I am using an inefficient message selector?

Private queues:

The section in “MP16: Capacity Planning and Tuning Guide for IBM MQ for z/OS” labelled “How do I know if I am using a good message selector” provides a process to help review MQ’s class(3) accounting data for the task.

A high number of skipped messages and a high CPU cost would indicate that the MQGET is not optimal.

Shared queues:

MQ’s class(3) accounting data report will reveal multiple READLIST entries per MQGET, which indicates that it was necessary to access the CF multiple times to locate the specific message.

Indexing the queue with an appropriate INDXTYPE will reduce the number of READLIST calls, re-drives, and the overall CPU cost.

Checklist

This is taken from “MP16: Capacity Planning and Tuning Guide for IBM MQ for z/OS” in the “Using Message Selectors” but is worth highlighting in this blog:

Selecting on message properties is more expensive than using Message ID or Correlation ID.
Selecting on optimized message properties such as JMSCorrelationID performs better on appropriately indexed queues.
Complexity of selection criteria may add cost to identifying the appropriate message.
Queue depth, as well as where the desired message is on the queue, is a significant factor in the cost and time taken to identify the desired message.
Ensure there are sufficient CHIADAP (channel initiator adaptor) tasks available such that at least 1 is unused even at peak times, to avoid waiting for an adaptor.
- An expensive MQGET request from a client, such as one using message selection will block its’ adaptor task.
- When using message selection on shared queues, the adaptor task in use may see significantly longer elapsed time than CPU time due to the time spent accessing data in the CF. Waiting for the CF response still blocks the adaptor task.
The cost of the MQGET can be affected by the number of messages that have to be scanned before finding the desired message.
- Private queue - check accounting class(3) data for the task, specifically look at the number of skips required to satisfy all the MQGETs in the interval. The number of skips can have a direct impact on the cost.
- Shared queues - check accounting class(3) data for the task, specifically looking at the WTASCMEC variable (number of IXLLSTM calls) and calculate the ratio of calls to MQGETs. A high ratio of WTASCMEC to MQGETs may indicate many messages having to be retrieved and scanned.
- Alternative for shared queues - if the only queues in the application structure are accessed in the same way, the RMFTM Coupling Facility Activity report can help indicate whether the number of CF requests corresponds to the number of messages gotten. Where there are many requests to MQGETs, the queue manager is likely having to scan many messages to find a match.
- Consider separate queues if frequently selecting properties, or store the message property in the Correlation ID.
- MQ is not a database and as such database-type queries on message properties is not a particularly efficient method to identify a desired message.
Ensure the MQOPEN handle is retained as long as possible to get all messages, which may avoid the cursor resetting to the head of the queue.
If the use of non-optimized selection criteria is unavoidable, try to ensure the depth of the queue is kept as low as possible by having a separate queue for each use case.

Summary

As this blog has shown, optimizable message selectors can be used on queues that are not indexed, or queues that have been indexed using a different index type.

However, just because you can use one of the optimizable selectors on a queue with a different index type, it doesn’t mean that you should, and hopefully this blog demonstrates why that is the case.

Using message selectors particularly on shared queues can result in relatively expensive MQGET requests. Wherever possible used optimized selectors with appropriately indexed queues.

Finally, there is further information on the performance of message selectors in “MP16: Capacity Planning and Tuning Guide for IBM MQ for z/OS” in the “Using Message Selectors” section of the Advice chapter. This section offers further guidance on the optimized message selectors, how to determine if the selector in use is efficient or not, general performance, and the implications of using client-based selectors.

0 comments

30 views

Permalink

https://community.ibm.com/community/user/blogs/anthony-sharkey1/2023/10/05/mq-for-zos-message-selector-performance

MQ

MQ

MQ for z/OS: Message selector performance

By Anthony Sharkey posted Thu October 05, 2023 04:26 AM

MQ for z/OS: Message Selectors on shared queues

What are we measuring?

Terminology

Private queues

Shared queues where message is entirely in the Coupling Facility

Shared queues where message is stored in Shared Message Data Sets

Low activity queues and message selection on shared queues.

Multiple get tasks on shared queue with non-indexed queues.

Using client-based selectors

With messages building up on queues, would more getter-type applications help?

How do I know I am using an inefficient message selector?

Checklist

Summary

Permalink

Additional
Resources

Office

Quick Links

MQ

MQ

MQ for z/OS: Message selector performance

By Anthony Sharkey posted Thu October 05, 2023 04:26 AM

MQ for z/OS: Message Selectors on shared queues

What are we measuring?

Terminology

Private queues

Shared queues where message is entirely in the Coupling Facility

Shared queues where message is stored in Shared Message Data Sets

Low activity queues and message selection on shared queues.

Multiple get tasks on shared queue with non-indexed queues.

Using client-based selectors

With messages building up on queues, would more getter-type applications help?

How do I know I am using an inefficient message selector?

Checklist

Summary

Permalink

Additional Resources

Office

Quick Links

Additional
Resources