IBM MQ Family

Moving messages between IBM MQ for z/OS queue managers in a sysplex 

Tue November 12, 2019 06:25 AM

Moving messages between IBM MQ for z/OS queue managers in a


tonySharkey |Jan 17 2017 Updated

When looking at moving messages between MQ queue managers in a sysplex, there are a number of options for configuration. Recent innovations in the TCP stack such as SMC-R and SMC-D have led us to revisit whether shared queues can still be regarded as the "gold standard".

In summary, this blog will show that for small messages, shared queue still achieves the best throughput rate but as message size grows and/or workload increases, alternative configurations may be more optimal.

A bit more detail:

There are a number of different configurations for moving messages between queue managers in a sysplex and this blog aims to show the relative costs, transaction rates and impact on accounting for a subset of these configurations.

The following configurations have been measured:

  1. Workload using pairs of Sender-Receiver channels using 10Gb TCP/IP performance network
  2. Workload using pairs of Sender-Receiver channels using Shared Memory Communication over RDMA (SMC-R) as a network optimization. A brief introduction to SMC-R can be found at SMC-R RoCE Frequently asked questions.
  3. Workload using pairs of Sender-Receiver channels using Shared Memory Communication with Direct Memory Access (SMC-D). SMC-D was introduced in z/OS v2r2 and uses internal shared memory (ISM) for communication between 2 SMC capable peers that are located on the same central processor complex (CPC).
  4. Enable Intra-Group Queuing to move the messages between queue managers.
  5. Use shared queues, i.e. request messages are put/got from shared queue 'A' and reply messages are put/got from shared queue 'B'. Where messages larger than 63KB are used, the messages are stored on Shared Message Data Sets (SMDS).

In each of these measurements, a queue is used for the request portion of the workload and a separate queue is used for the reply portion. These 2 queues are treated as a pair and are used by all requester and server tasks for an iteration of the workload. As the workload is increased, additional pairs of queues, with associated requester and server tasks, are added. Each queue is allocated a separate MQ channel where applicable.


Transaction Cost:

Transaction cost is based upon the cost of the entire request/reply round-trip and includes the following address spaces:

  • Requester and server queue managers
  • Requester and server channel initiators
  • Requester applications
  • Server applications
  • TCP/IP

Also included is the calculated cost to the CF, based on the RMF Coupling Facility report where we have taken the "% busy" number and converted to CPU microseconds, then divided by the transaction rate, to give a CF cost per transaction.


Table 1: Transaction Cost by address space - CPU microseconds per transaction for 2KB messages.

Transaction cost

TCP/IP Channels

Channels using SMC-R

Channels using SMC-D


Shared queue





































Total including CF






Total excluding CF






Notes on table 1:

Some customers may regard the CF cost separately or as a fixed cost, so the table reports the costs both with and without the CF cost.


Chart 1: Transaction Cost


  • Chart 1 includes the cost of any CF processing, however the transaction costs when the CF is excluded are similar for all configurations.
  • XCFAS costs represent less than 1% of the total costs in all measurements, with the shared queue measurement demonstrating the highest value of 1.4 microseconds per transaction.


When using MCA channels, there is a large proportion of the transaction cost incurred by the channel initiators which involves the getting of the message from the transmission queue and interfacing with the network protocol (TCP/IP).


Using intra-group queuing (IGQ) has effectively moved the cost from the channel initiator to the queue manager address space. In addition to this, there is increased usage of the Coupling Facility. Furthermore there is an increase in the MQPUT cost as the message is being put to the ‘SYSTEM.QSG.TRANSMIT.QUEUE’ which is a shared queue. In this instance, the Coupling Facility was 44% busy.


When shared queues are used, the queue manager cost is decreased from the IGQ-level but is still 5 times that of the measurements using MCA channels. The application cost is also increased, primarily in the requester application address space, as the application is performing an MQGET of a specific message plus the queue depth is sufficiently low that there is a further cost when the depth transitions from 0 to 1. The Coupling Facility utilization was running at 42% busy.

Transaction Rate:

The following chart shows the transaction rate achieved when running the request/reply workload between the 2 queue managers.


Chart 2: Achieved Transaction Rate in each configuration

In these measurements, the TCP/IP channel throughput is limited largely by network latency.


The use of SMC-R and SMC-D has reduced this latency and given a significant improvement in throughput at little additional cost.

Intra-group queuing offers an improved rate over the standard TCP/IP channel but the benefits will depend on the CF both in its responsiveness and location relative to the LPARs.


The shared queue configuration offers similar throughput to the SMC-R and SMC-D measurement with less administration effort but does add significant load to the Coupling Facility at these transaction volumes.

What happens when the workload is scaled to use more queues?

The measurements thus far use a single pair of queues for the request/reply workload for 2KB messages and are not intended to demonstrate performance characteristics as system limits are approached.


The measurements in this section aim to demonstrate how performance may change as the workload is increases via the use of more pairs of queues.


Chart 3: Total cost per transaction with increasing workload

As the workload increases the cost increased as follows:

  • TCP/IP increased 10%
  • IGQ increased 17%
  • SMC-R and SMC-D increased 40%
  • Shared queue increased 54%


In the measurements using SMC-R, SMC-D and Shared queue additional processors were allocated to allow the workload rate to increase, whereas the IGQ and TCP/IP measurements hit implementation limits.


These increases are based on the total costs from the MQ queue manager and channel initiator address spaces, TCP/IP, batch application address spaces and coupling facility. The batch applications use minimal business logic and typically account for 25% of the total transaction cost in all but the shared queue configuration.


In the shared queue configuration, the application costs comprise up to 50% of the total transaction cost and this increase is due to the higher cost of putting to the Coupling Facility and this cost increases as the CF becomes less responsive.


Chart 4: Coupling Facility usage with increasing workload


Chart 4 shows that as the workload increases, the CF becomes busier for the shared queue configuration. The level increases to in excess of 90% busy and it may be beneficial to increase the number of CF processors available as the CF will become less response by converting responses from synchronous to asynchronous as well as the synchronous response times starting to increase. In these measurements the synchronous response times increased from 4.7 to 12.8 microseconds.


For the shared queue measurement where the CF has multiple processors and the CPU exceeds 60%, we would suggest that more CF processors should be allocated.


With regards to the IGQ measurement, the CF remains at approximately 40% busy regardless of workload and this appears to be due to IGQ being driven to its limit.


Chart 5: Achieved Transaction Rate with increasing workload


The MCA channels show a steadily increasing transaction rate as the workload increases.


By comparison the SMC-R and SMC-D measurements shows a higher transaction rate which shows no indication of being constrained by implementation limits.


The IGQ measurement actually shows a decrease in transaction rate, dropping 24%, due to increasing contention on the IGQ resources.


The shared queue measurement shows good scaling performance until the Coupling Facility becomes the constraining factor.

What happens with larger messages?

The measurements thus far have been limited to messages of 2KB and those measurements using MQ channels are not driving the network particularly efficiently.


For those measurements using shared queued and IGQ, the impact of larger messages will depend on the size of the messages and the whether the offloaded portion of the message is either to Shared Message Data Sets or DB2 BLOBs. Note that DB2 BLOB performance is not discussed in this blog. When messages are offloaded, the Coupling Facility usage is significantly reduced, instead the DASD subsystem may see an increase in workload.


To demonstrate the performance differences of larger messages, measurements are reported for 64KB and 4MB messages.

Measurements using 64KB messages

The following 2 charts compare the transaction cost and transaction rates of the 5 configurations when using 64KB messages.


With these larger messages, the messages flowing across MQ channels are processed more efficiently, i.e. the proportion of the MQ implementation headers of the total payload is much smaller with large messages.


Chart 6: Total transaction cost with increasing workload using 64KB messages


In both the IGQ and Shared queue measurements the messages are stored on the shared message data sets, and the remote queue manager will read those datasets directly to access the message data. This means that the Coupling Facility usage is much less than for the small message workload.


The measurements using MQ channels have the majority of the transaction cost in the MQ channel initiator and TCP/IP address spaces.


Chart 7: Achieved Transaction Rate with increasing workload using 64KB messages

It is with these larger messages that the benefit of both SMC-R and SMC-D can be observed.


While the IGQ throughput has peaked for the workload, the shared queue measurement shows consistent improvements as the workload is increased. This is largely because each queue is defined to a separate structure which in turn has a unique shared message data set. In this measurement the performance is likely to be constrained by DASD performance.


The SMC-R measurements are able to process more than 3 times the volume of data that the peak rate achieved in the shared queue measurement.


SMC-D achieved 30% more throughput than the peak SMC-R measurement and was able to process over 21,000 transactions per second, which equates to approximately 1350 MB/second outbound and 1350 MB/second inbound to each queue manager.


Measurements using 4MB messages

The following 2 charts compare the transaction cost and transaction rates of the 5 configurations when using 4MB messages.


Chart 8: Total transaction cost with increasing workload using 4MB messages


For 4MB messages, the most expensive configurations are those using MQ channels, however as chart 9 shows, these also gives the greatest throughput rate.


In these measurements the shared queue measurements were lowest cost due to the use of shared message data sets, which meant that the data was read directly from disk rather than having to be transported over the network. This of course puts additional load on the disk subsystem and as demonstrated in chart 10, does not necessarily give the best performance.


The cost for the measurements using MQ channels see approximately 80% of the total cost in the MQ channel initiator and TCP/IP address spaces, whereas IGQ is predominantly in the MQ master address spaces and shared queue is largely in the application address spaces.


Chart 9: Achieved Transaction Rate with increasing workload using 4MB messages


In chart 9, the SMC-D transaction rate significantly exceeds all other configurations, achieving more than 3 times the rate of TCP/IP, IGQ and Shared queue measurements.


The SMC-D measurement also demonstrates a 60% improvement over the SMC-R equivalent tests. This equates to approximately 1.6GB per second for the outbound channels and 1.6GB per second for the inbound channels.


Both the IGQ and shared queue measurements show little benefit as more queues are made available to transport data.


Chart 10: Breakdown of transaction cost by address space compared with transaction rate

Chart 10 shows transaction cost by each address space for the 5 queue pair measurement with 4MB messages.


As mentioned earlier, the largest proportion of the cost of the MQ channel measurements is in the MQ channel initiator and the TCP/IP address spaces.


From the MQ channel initiator statistics and accounting data, which has been available since MQ for z/OS version 8.0, we can see that approximately 80% of the channel initiator cost is spent in the dispatcher tasks, which are primarily the interface to the network.



The results show that for small messages, shared queue still offers the best transaction rate combined with the minimum use of z/OS CPU time, however with the availability of SMC-R and SMC-D, equivalent performance can be achieved for a small increase in CPU cost for those customers who are constrained by CF resources.


A rule of thumb for the particular set of measurements using 2KB messages might be that:

  • IGQ costs 40% more than the MCA measurement, including CF cost, but may offer improved performance if the network latency is an inhibitor to performance.
  • Shared queue costs 30% more than the MCA measurement, including the CF cost, but may offer improved performance.

For larger messages, where the message cannot be stored entirely in the Coupling Facility, the use of SMC-R and particularly SMC-D can offer significant benefit in throughput but at increased cost in the MQ channel initiator and TCPIP address spaces.


Where shared queue or IGQ is not available, the use of SMC-D or SMC-R should be considered as an enhancement to the basic performance of TCP/IP.


In our shared queue measurements using 2KB messages with multiple queues, the CF CPU utilization was higher than is typically suggested for those processor types and it would be advisable in a production environment to ensure additional CF processors are available.


In a system where the Coupling Facility is already being used, it may be that the additional load of moving messages either via IGQ or using a shared queue for put on LPAR 1 and get on LPAR 2 is sufficient to overload the CF such that response times degrade by going asynchronous.


Different performance characteristics may be observed with more sets of application queues. For example, as more queues are used in the MCA channel configurations, we would expect the throughput to scale well until some system limit is reached, such as network capacity, CPU or even the IBM MQ dispatcher task being busy.


In the configurations where the Coupling Facility is a significant factor, there are many areas that need to be considered, including the CPU utilization, whether responses are synchronous or asynchronous – as the load increases, more responses will become asynchronous which could degrade the performance, whether the links to the CF are saturated, etc. Performance report MP16 offers some guidance on what may need to be monitored in a shared queue environment.


Testing environment:

  • Performance SYSPLEX defined on 2964-NE1 consisting of 2 LPARs, running z/OS v2.2 FMID HBB77A0.
  • LPAR 1 defined with 10 dedicated processors
  • LPAR 2 defined with 10 dedicated processors
  • There is a dedicated 10GbE network between the 2 LPARs and the systems are configured on different subnets such that network requests go via a single switch.
  • SMC-R is available but disabled unless specified.
  • SMC-D is available but disabled unless specified.
  • Internal Coupling Facility with 4 dedicated processors with multiple ICP paths to CF from each connected LPAR, running CFCC release 21.00 service level 2.16
  • Queue managers are running IBM MQ V8 with latest service applied.
  • The 2 queue managers are part of a queue sharing group (QSG) running at CFLEVEL 5 with shared message data sets available.
  • A request/reply workload, driven by batch applications on LPAR 1 is run in each case such that:
    • There are 6 long-running batch requester tasks putting fixed size non-persistent messages to a common queue. Once the message is put, the batch requester tasks go into a MQGET-with-wait for a specific reply message.
    • There are 2 long-running batch server tasks that MQGET the next available message and MQPUT a reply message of the same size, setting a pre-agreed correlation ID. The MQGET and MQPUT are in-syncpoint.


This is not an exhaustive list of configurations, for example it is possible to use shared transmit queues.


Measurements were run on a dedicated performance system and different behavior may be observed on a busier system.


Additional testing would be required for configurations where:

  • Network is running at capacity
  • CF response times are less optimal, possibly from CF location, specification, how busy the CF is, duplexing, insufficient paths to CF
  • Message persistence
  • Channels are specified with NPMSPEED(FAST)
  • Queue depths are not low. High queue depths in CF may result in messages being offloaded to shared message data sets (SMDS) or if selected DB2 BLOBs. This may affect performance. Deep shared queues may benefit from the present of "storage class memory" (CF Flash).


Entry Details

0 Favorited
1 Files
docx file
Moving messages between IBM MQ for z:OS queue managers in...docx   457K   1 version
Uploaded - Tue November 12, 2019

Tags and Keywords

Related Entries and Links

No Related Resource entered.