MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to Blog List

IBM MQ for z/OS 9.3.3 - Performance of Kafka Connectors on z/OS

By Anthony Sharkey posted Mon July 31, 2023 07:30 AM

IBM MQ for z/OS 9.3.3 – Performance of Kafka Connectors on z/OS

As you may have seen in one of several recent announcements, IBM MQ 9.3.3 Continuous Delivery Release included Kafka Connectors as part of the IBM MQ Advanced for z/OS Value Unit Edition.

These Kafka Connectors enable data to be passed between your MQ subsystems and your Apache Kafka Clusters.

The announcement blog, “IBM MQ, IBM MQ for z/OS and IBM MQ Appliance firmware 9.3.3 Continuous Delivery releases are available” details how you can download the Kafka Connectors.

My colleague, Amy, recently wrote a subsequent blog “Unlock events in mission critical systems” , which discusses the use of Apache Kafka and IBM MQ.

Finally, a few years ago I also wrote an initial blog “Kafka Connector for IBM MQ – a MQ for z/OS perspective” that discussed what is Apache Kafka, a few things that caught me out when using Kafka Connectors with MQ for the first time, what MQ definitions we used and an overview of the configurations that were used for measuring the performance.

At that time, we looked at three configurations:

Kafka connector remote from z/OS using client connection into MQ.
Kafka connector local to z/OS, using USS-based client connection into MQ.
Kafka connector local to z/OS, using USS-based bindings mode connection into MQ.

These configurations can be represented thus:

Configurations when using Kafka with IBM MQ

What we found, and was discussed in the blog, was that whilst all these configurations are capable of supporting Kafka Connectors with MQ, in terms of performance the bindings mode configuration was able to achieve the best results in terms of lowest cost on z/OS and potentially highest throughput.

In our tests, using Kafka Connectors in the bindings configuration can achieve the best performance, and so the measurements discussed in the remainder of this blog are limited to the bindings configuration.

For the purposes of this report, the Kafka connectors used are the following versions:

Source: 1.3.2

Sink: 1.5

This report will discuss the following items for each connector type:

Peak sustainable throughput rates including increasing number of connector tasks.
Absolute peak rates.
Impact of message size.
Impact of persistent messages.
Costs of an idle connector.

Additionally, there are further specific connector-type sections:

Impact of low (trickle-) rate workloads on “MQ as a Source” connector.
Impact of shared queue on “MQ as a Sink” connector

For the purposes of this report, the configuration uses an MQ for z/OS 9.3.3 CDR queue manager configured with the MQ queues mapped to a storage class on a 16GB buffer pool and page set of corresponding size.

The queue manager runs on a LPAR hosted on an IBM z16 with 3 dedicated CPUs and 2 zIIPs running z/OS 2.5, with dedicated links to a performance DS8950 DASD subsystem.

The Kafka broker runs on an IBM z15 running RedHat Enterprise Linux 8.8 with 6 dedicated CPUs.

The two IBM mainframes are linked by a 10Gb performance network with minimal latency.

MQ as a source

When using the “MQ as a source” connector, the messages are put by a small set of simple applications running in batch.

The connector will then get the messages from the queue and stream them to the Kafka cluster.

Peak sustainable throughput rates

For peak throughput with sustainable workloads, the tests are running with batch applications putting and the connector tasks getting messages concurrently. For the purposes of these tests, sustainable means the connector can consume messages at a rate such that the queue depth does not increase significantly.

The test uses multiple instances of Kafka’s kafka-consumer-perf-test.sh shell script to consume the messages when they arrive on the Kafka topic.

In our environment, the Kafka connector can get 28,000 messages per second from the queue when configured with a single connector task i.e., tasks.max=1

Adding a second task increased the sustainable rate by 66%. As further tasks were added, the percentage increase continued to reduce, until showing no additional benefit moving from 4 to 5 tasks.

Maximum sustainable rate when using MQ as a Source connector in bindings mode

With regards to the cost of running the “MQ as a source” connector in the Unix System Services (USS) environment, a large proportion can be offset by using zIIPs.

For example, typically we observed a cost per message of 21-24 CPU microseconds in the Kafka Connector for the 1KB non-persistent messages, of which 93% was zIIP eligible, reducing the cost to 1.5 CPU microseconds per message.

Additionally, the MQ class(3) accounting data reports the cost of the MQ get as 5 CPU microseconds.

Note: I have not included the MQ CPU costs other than those reported by MQ’s class(3) accounting trace as this would include the cost of the local application that is putting the messages to the queue.

Absolute peak rates

For absolute peak rate throughput measurements, the tests are running such that in the case of the “MQ as a source connector”, the queue is pre-loaded prior to the connector being started, and no further messages are put to the queue whilst the connector is active.

By pre-loading the queue (MQ as source), the workloads are designed to remove as much contention as possible and allow the connector to run at peak capacity. This does mean that the workloads are limited in size to the amount of space in the buffer pool and may mean that the Kafka connector is not fully warmed up for the entire measurement.

For the “MQ as a source” connector, we found no significant impact on the rate that the connector was able to get messages from the MQ queue when there were concurrent task(s) putting messages to the queue, provided the buffer pool was sufficiently large to contain the messages.

Impact of message size

The primary impact of message size for non-persistent workloads is the network performance (capacity and latency).

In our measurements we were able to achieve the following rates using 1 connector task with the “peak rate” configuration.

Message Size	Messages / Second	Approximate MB / Second
1KB	28,776	28
10KB	10,030	98
32KB	3,750	117

Impact of persistent messages

The performance of the “MQ as a source” connector with persistent messages can be significantly impacted by many factors including:

The number of messages gotten per MQ commit.
What other workloads are being logged by the queue manager.
The I/O response times.

When using the “MQ as a source” connector, the messages are being gotten from IBM MQ. The amount of data logged for persistent messages is far less for MQGET than MQPUT. The performance report “MP16 Capacity Planning & Tuning Guide” has a section “How much log space does my message use?” which shows that a 1KB message that is put/committed logs 2088 bytes, whereas the same message with get/commit logs just 492 bytes.

Furthermore, multiple gets per commit (a model used by the connectors) further reduce the number of bytes logged per message, for example gets of 10 messages of 1KB followed by a single commit would log 178 bytes per message.

What this means is that when the source connector is getting persistent messages at the same time as an application is putting persistent messages, there is contention for the single MQ log task. This is partly as the putting application is likely logging far more data which results in the log I/O taking longer, and the source connector having to wait for the log writes to complete.

In our measurements on our relatively lightly loaded systems, with applications putting messages at the same time as the source connector was active, we saw low rates of throughput by the source connector, ranging from 2270 to 2510 persistent 1KB messages per second. The performance is not significantly improved when adding additional tasks, i.e. we observed a 10% improvement moving from 1 to 5 tasks.

By enabling zHPF, the I/O times were significantly reduced, and this enabled the connector to see a 35-55% increase in throughput.

By increasing the rate that the local application put messages to the queue being monitored by the source connector, such that the connector always had messages to get meant that the messaging get rate increased significantly. For example, with a single connector task when the messages being put using the model of MQPUT/commit, we saw a peak rate of 2271 messages per second, whereas when 100 messages were put before committing, the connector was able to process at 27,396 messages per second.

In the single MQPUT per commit model, the elapsed time of the commit meant that rarely were multiple messages available for the connector task to get, so the connector task would use a model of MQGET(success), MQGET(fail), commit, and the elapsed time for the commit is by far the longest part of the transaction time.

By ensuring there were multiple messages available on the queue, the source connector was able to use large units of work – up to 250 messages per commit, which meant that the time spent performing the relatively long commits was minimized.

Sustainable messaging rate using MQ as a Source connector with persistent messages

In this chart, only the behaviour of the local batch application putting the messages has changed. This has a direct impact on the source connector as the number of messages per commit increases from 2-3 to 250.

Costs of an idle connector

There may be periods when your “MQ as a source” connector is not processing any messages, and it is helpful to understand what costs might be incurred in those idle periods.

Version 1.3.2 of the “MQ as a source” connector periodically issues MQGETs on the MQ queue and these occur every 30 seconds.

This has a small cost to both the queue manager and the connector.

In terms of connector cost, we measured the cost as approximately 0.6 CPU seconds per hour when tasks.max=1 and 1.9 CPU seconds per hour with tasks.max=5.

The MQ get cost was reported as between 1 and 2 CPU microseconds for the equivalent period.

Impact of low (trickle-) rate workloads on “MQ as a Source” connector

The design of the “MQ as a Source” connector attempts to be efficient, such that initially the connector will issue an MQGET-with-wait of 30 seconds and when a message is successfully gotten, the connector immediately attempts another MQGET but this time without a wait period specified.

When messages are being put at a low rate to the queue being serviced by an “MQ as a Source” connector such that the queue depth is consistently low, there may be additional MQ gets attempted that do not result in a successful get of a message, and these are sometimes referred to as “empty gets”.

This can lead to class(3) accounting and class(5) statistics reporting 3 MQGETs per successful get. The 2 unsuccessful gets, i.e., the initial MQGET-with-wait that had no message and the MQGET following the successful get are low cost in a bindings configuration (approximately 1-2 CPU microseconds). In a client configuration these failed MQGETs will result in additional adaptor and dispatcher calls in the MQ channel initiator address space, with a more noticeable cost.

The following diagram demonstrates what MQ’s accounting class(3) and statistics class(5) traces would show when there is a slow message arrival rate:

MQ Class(3) Accounting data with slow arrival of messages

MQ as a sink

Kafka sink connectors only support 1 connector task per partition. Typically, you may have multiple partitions, so it seems sensible to have at least the same number of connector tasks for the “MQ as a sink” connector as the number of partitions. It may even be beneficial to have larger numbers of tasks in the event of more partitions being configured.