MQ

MQ for z/OS - Frequent opening of shared queues

By Anthony Sharkey posted Fri October 01, 2021 04:00 AM

  

The opening of a queue is a fundamental part of being able to use MQ in an application, and in certain configurations such as short-lived transactions,  the MQOPEN verb may form a considerable proportion of the MQ cost.


From a performance perspective, it is preferable for the application to open the queues once, perform multiple puts and gets, then close the queues, but for many reasons this is not always possible.

As such, it is worth considering what factors might affect the performance of the MQOPEN, and how you might mitigate those costs.

In this blog, we will concentrate on shared queues, as this is an area where there is the largest potential for reducing cost.

When an MQ queue is accessed, the queue manager must register the use. For private queues, that registration takes place in the local queue manager, but for shared queues, that registration has to be coordinated with the other members of the Queue Sharing Group (QSG) and this is done in the Coupling Facility (CF).


Each registration of use of the shared queue may involve multiple calls to the CF, and the response time from those calls will impact the time and cost to complete the MQOPEN and indeed MQCLOSE too, which may have to de-register the use of the queue.

Response time will be affected by many factors, including the distance to the CF,  how busy the CF and the links from z/OS to the CF are, and whether the structures in the CF are duplexed.

Some of those factors can be addressed, for example if the CF is particularly busy, additional processors may improve the response times, but equally some cannot be resolved.

The most efficient way to reduce the impact of the registration and de-registration of the use of the queue in the Coupling Facility is to eliminate it - but without losing the benefits of the QSG and shared queues.

When I wrote earlier that with shared queues the registration has to be coordinated with the other members of the QSG, that was not strictly true in all cases...

If no other application on the queue manager already has the queue open, then the queue manager has to go to the CF to register the interest in the queue and to get information about the queue. This is known as the first-open effect. If the queue manager already has the queue open, the information is already available and the queue manager does not need to register.

When an application closes a shared queue, if no other application on that queue manager has the queue open, then the queue manager has to go to the CF to de-register interest in the queue. This is known as the last-close effect.

There is the added complication that the updates to the CF for any queue must be serialized - so if one queue manager is in the process of registering usage of a queue in the CF, should another queue manager also attempt to register its own usage of the queue, the second registration will fail and the queue manager will have to retry its registration. How quickly the second registration can be retried will depend on CF response times and availability of CPU in the z/OS LPAR. In the worst case, you can end up with many retries, particularly if a third (fourth, fifth etc) queue manager is in the process of registering their use of the queue before the second queue manager was able to successfully retry.

For applications that have the queue open for a long time, first-open and last-close is not a particular issue, and nor is when there is a high-throughput of short-lived transactions as the queue may always be open by at least one application at any time.

However, if there are short-lived transactions where the workload is sufficiently low that there is little or no overlap such that the queue is not open all of the time, there can be a high occurrence of first-open and last-close effects.

This has previously been documented in the MP16 "Capacity Planning and Tuning Guide" report in the section "Frequent opening of shared queues".

So far, nothing new, but we have recently found that the type of open option used can also affect the first-open and last-close effect.

MQOO_INPUT_SHARED
When a queue manager attempts to register the usage, it takes a lock prior to registration - and as the number of queue managers attempting to register increases so too can the time holding an MQ latch. This means that when using MQOO_INPUT_SHARED, you may see increased CPU time and increased latch time when one or more queue managers attempt to register their use of the queue.

MQOO_OUTPUT
When opening the queue for output, the registration does not result in increased latch time, but if the attempt to register fails due to other queue managers registering their use of the same queue, then the retry processing will result in additional CPU usage in the application address space.



How much does first-open / last-close cost?
In the following 2 examples we define 3 queue managers in a QSG, where each queue manager is running on a separate LPAR of a single z15. The CF is also located on the same z15.

A single application is connected to queue manager 1 and performs 1 million MQOPEN and MQCLOSE calls of a single common shared queue.

The measurement is then repeated with the same application running against queue managers 1 and 2, and then finally against queue managers 1,2 and 3.

The costs reported are those of the average of the 1 million MQOPEN requests on queue manager 1, using data from class 3 accounting.

Measurement 1: Queues opened with MQOO_INPUT_SHARED
The table following shows the costs in CPU microseconds of the MQOPEN on queue manager 1, as increasing numbers of queue managers attempt to register use of the same queue.

Queue Managers 1 2 3
MQOPEN CPU 21 31 51
MQOPEN Elapsed time 27 122 520
Wait (Elapsed - CPU) 6 91 469


The CPU cost of the MQOPEN on queue manager 1, increased from 21 CPU microseconds by 47% when applications on 2 queue managers attempted to open the same queue. This further increased another 61% when the application on the third queue manager was introduced.

As the table indicates, the "wait" time shown is calculated from the elapsed time minus the CPU time. This wait time increased from 6 microseconds by 85 microseconds when running the workload against 2 queue managers and a further 378 microseconds with 3 queue managers.

This increase in wait time is largely due to waiting for lock.

Additionally we saw increased DMCSEGAL (latch 11) time, for example in these measurements the time spent waiting for DMCSEGAL increased from 7 to 16 microseconds per MQOPEN with increased queue managers. With the latch being held across the CF access, a less responsive CF may result in increased DMCSEGAL times.


Measurement 2: Queues opened with MQOO_OUTPUT
Once more, the table following shows the costs in CPU microseconds of the MQOPEN on queue manager 1, as increasing numbers of queue managers attempt to register use of the same queue.

Queue Managers 1 2 3
MQOPEN CPU 6 16 29
MQOPEN Elapsed time 6 17 29
Wait (Elapsed - CPU) 0 1 0


When opening the queue for output, the CPU cost of the MQOPEN on queue manager 1 increased from 6 CPU microseconds by 10 CPU microseconds when applications on 2 queue managers attempted to open the same queue. This further increased by 13 CPU microseconds when the application on the third queue manager was introduced.

The "wait" time for MQOPEN for output is minimal, and this is also reflected by no increase in latch times.

Whilst opening the shared queue for output saw minimal wait time, the increased CPU cost also resulted in additional load on the CF due to the queue manager retrying its attempts to register the queue usage.  In some circumstances, particularly where many queue managers were attempting to register access to the same shared queue, the MQOPEN can see "CF retries" - and this is reported in the MQSMF task report (available in the performance supportPac MP1B), for example:

Open count                     565020    SIXC01
Open avg elapsed time              17 uS SIXC01
Open avg CPU time                  16 uS SIXC01
Open CF access                 565020    SIXC01
Open no CF access                   0    SIXC01
CF time/verb                       11 uS
CF Avg Sync elapsed time/verb      11 uS
CF Sync number of request     1591903
CF Avg Sync CF response time        4 uS
CF Retries                     337039 out of 1591903 ( 21.2%)

In this example, there were 337,039 CF requests that had to be re-tried before the queue manager was able to successfully register access to the SIXC01 shared queue.

Note that these are CF retries and not CF re-drives. CF re-drives tend to occur when the buffer passed to the CF is not sufficiently large for the returning data, whereas the CF retry may occur when the queue manager is unable to complete its request due to another MQ queue manager performing a similar request.

How can I tell if I am seeing first-open or last-close effects?
First-open and last-closes can be observed in the MQSMF task report (as part of MP1B), which reports whether the API request required access to the CF or whether it could be resolved with "no CF access", for example:

Open CF access                 565020 SIXC01

Open no CF access                   0 SIXC01

This indicates that 565,020 open requests were made, all of which that required access to the CF.

Additionally we can see that there was contention occurring due to other queue managers attempting to access the same MQ resource, particularly when opening queues for output as the following information was reported:

CF Retries                     337039 out of 1591903 ( 21.2%)

How can I reduce the impact from first-open and last-close?
If you see a large number of first-open or last-closes, including from CF retries, then using an application connected to each queue manager that opens the shared queue(s) and subsquently goes into a long sleep, can significantly reduce the time and cost of the frequent MQOPEN and MQCLOSE in your applications.

Using a long running application means that the CF access is no longer required by the applications that open and close the queue frequently, which in turn reduces the cost of those MQ verbs.

Note:
In order to minimise the first-open and last-close, the application connecting must open the shared queue(s) using the same type of open as the applications performing the frequent queue opens. That is to say that if the frequently opened queue is opened for output, then the application used to mitigate the effects should also open the queue for output.
If the application used to mitigate the first-open effect uses a different open type, there will be no benefit as the frequent opening task will still need to access the CF.

Repeating the earlier measurements with an additional application connected to each queue manager that MQOPENs the common shared queue and then sleeps, the average cost of the 1 million MQOPENs dropped to 1 CPU microsecond regardless of the number of queue managers.

Is it always appropriate to have an application holding the queue open?
In short, not all shared queue configurations will benefit from having an application that opens the shared queue with the appropriate open option and then sleeps.

For example, if the shared queue is configured with TRIGTYPE(FIRST), then such an application can prevent the triggering mechanism from initiating the desired triggered application.

It is worth noting that TRIGTYPE(EVERY) can benefit from the long running application - and in a simple CICS environment, we measured a reduction in MQOPEN costs of up to 80% and MQCLOSE costs reduced by up to 90%.


Finally, this latest information will be available in the next release of MP16. To see the complete list of MQ performance reports on GitHub:  MQ Performance Documents


#MQ
#IBMMQ
#z/OS

Permalink