IBM MQ for z/OS 9.3.3 – Performance of Kafka Connectors on z/OS
As you may have seen in one of several recent announcements, IBM MQ 9.3.3 Continuous Delivery Release included Kafka Connectors as part of the IBM MQ Advanced for z/OS Value Unit Edition.
These Kafka Connectors enable data to be passed between your MQ subsystems and your Apache Kafka Clusters.
The announcement blog, “IBM MQ, IBM MQ for z/OS and IBM MQ Appliance firmware 9.3.3 Continuous Delivery releases are available” details how you can download the Kafka Connectors.
My colleague, Amy, recently wrote a subsequent blog “Unlock events in mission critical systems” , which discusses the use of Apache Kafka and IBM MQ.
Finally, a few years ago I also wrote an initial blog “Kafka Connector for IBM MQ – a MQ for z/OS perspective” that discussed what is Apache Kafka, a few things that caught me out when using Kafka Connectors with MQ for the first time, what MQ definitions we used and an overview of the configurations that were used for measuring the performance.
At that time, we looked at three configurations:
- Kafka connector remote from z/OS using client connection into MQ.
- Kafka connector local to z/OS, using USS-based client connection into MQ.
- Kafka connector local to z/OS, using USS-based bindings mode connection into MQ.
These configurations can be represented thus:
What we found, and was discussed in the blog, was that whilst all these configurations are capable of supporting Kafka Connectors with MQ, in terms of performance the bindings mode configuration was able to achieve the best results in terms of lowest cost on z/OS and potentially highest throughput.
In our tests, using Kafka Connectors in the bindings configuration can achieve the best performance, and so the measurements discussed in the remainder of this blog are limited to the bindings configuration.
For the purposes of this report, the Kafka connectors used are the following versions:
Source: 1.3.2
Sink: 1.5
This report will discuss the following items for each connector type:
- Peak sustainable throughput rates including increasing number of connector tasks.
- Absolute peak rates.
- Impact of message size.
- Impact of persistent messages.
- Costs of an idle connector.
Additionally, there are further specific connector-type sections:
- Impact of low (trickle-) rate workloads on “MQ as a Source” connector.
- Impact of shared queue on “MQ as a Sink” connector
For the purposes of this report, the configuration uses an MQ for z/OS 9.3.3 CDR queue manager configured with the MQ queues mapped to a storage class on a 16GB buffer pool and page set of corresponding size.
The queue manager runs on a LPAR hosted on an IBM z16 with 3 dedicated CPUs and 2 zIIPs running z/OS 2.5, with dedicated links to a performance DS8950 DASD subsystem.
The Kafka broker runs on an IBM z15 running RedHat Enterprise Linux 8.8 with 6 dedicated CPUs.
The two IBM mainframes are linked by a 10Gb performance network with minimal latency.
MQ as a source
When using the “MQ as a source” connector, the messages are put by a small set of simple applications running in batch.
The connector will then get the messages from the queue and stream them to the Kafka cluster.
Peak sustainable throughput rates
For peak throughput with sustainable workloads, the tests are running with batch applications putting and the connector tasks getting messages concurrently. For the purposes of these tests, sustainable means the connector can consume messages at a rate such that the queue depth does not increase significantly.
The test uses multiple instances of Kafka’s kafka-consumer-perf-test.sh shell script to consume the messages when they arrive on the Kafka topic.
In our environment, the Kafka connector can get 28,000 messages per second from the queue when configured with a single connector task i.e., tasks.max=1
Adding a second task increased the sustainable rate by 66%. As further tasks were added, the percentage increase continued to reduce, until showing no additional benefit moving from 4 to 5 tasks.
With regards to the cost of running the “MQ as a source” connector in the Unix System Services (USS) environment, a large proportion can be offset by using zIIPs.
For example, typically we observed a cost per message of 21-24 CPU microseconds in the Kafka Connector for the 1KB non-persistent messages, of which 93% was zIIP eligible, reducing the cost to 1.5 CPU microseconds per message.
Additionally, the MQ class(3) accounting data reports the cost of the MQ get as 5 CPU microseconds.
Note: I have not included the MQ CPU costs other than those reported by MQ’s class(3) accounting trace as this would include the cost of the local application that is putting the messages to the queue.
Absolute peak rates
For absolute peak rate throughput measurements, the tests are running such that in the case of the “MQ as a source connector”, the queue is pre-loaded prior to the connector being started, and no further messages are put to the queue whilst the connector is active.
By pre-loading the queue (MQ as source), the workloads are designed to remove as much contention as possible and allow the connector to run at peak capacity. This does mean that the workloads are limited in size to the amount of space in the buffer pool and may mean that the Kafka connector is not fully warmed up for the entire measurement.
For the “MQ as a source” connector, we found no significant impact on the rate that the connector was able to get messages from the MQ queue when there were concurrent task(s) putting messages to the queue, provided the buffer pool was sufficiently large to contain the messages.
Impact of message size
The primary impact of message size for non-persistent workloads is the network performance (capacity and latency).
In our measurements we were able to achieve the following rates using 1 connector task with the “peak rate” configuration.
Message Size
|
Messages / Second
|
Approximate MB / Second
|
1KB
|
28,776
|
28
|
10KB
|
10,030
|
98
|
32KB
|
3,750
|
117
|
Impact of persistent messages
The performance of the “MQ as a source” connector with persistent messages can be significantly impacted by many factors including:
- The number of messages gotten per MQ commit.
- What other workloads are being logged by the queue manager.
- The I/O response times.
When using the “MQ as a source” connector, the messages are being gotten from IBM MQ. The amount of data logged for persistent messages is far less for MQGET than MQPUT. The performance report “MP16 Capacity Planning & Tuning Guide” has a section “How much log space does my message use?” which shows that a 1KB message that is put/committed logs 2088 bytes, whereas the same message with get/commit logs just 492 bytes.
Furthermore, multiple gets per commit (a model used by the connectors) further reduce the number of bytes logged per message, for example gets of 10 messages of 1KB followed by a single commit would log 178 bytes per message.
What this means is that when the source connector is getting persistent messages at the same time as an application is putting persistent messages, there is contention for the single MQ log task. This is partly as the putting application is likely logging far more data which results in the log I/O taking longer, and the source connector having to wait for the log writes to complete.
In our measurements on our relatively lightly loaded systems, with applications putting messages at the same time as the source connector was active, we saw low rates of throughput by the source connector, ranging from 2270 to 2510 persistent 1KB messages per second. The performance is not significantly improved when adding additional tasks, i.e. we observed a 10% improvement moving from 1 to 5 tasks.
By enabling zHPF, the I/O times were significantly reduced, and this enabled the connector to see a 35-55% increase in throughput.
By increasing the rate that the local application put messages to the queue being monitored by the source connector, such that the connector always had messages to get meant that the messaging get rate increased significantly. For example, with a single connector task when the messages being put using the model of MQPUT/commit, we saw a peak rate of 2271 messages per second, whereas when 100 messages were put before committing, the connector was able to process at 27,396 messages per second.
In the single MQPUT per commit model, the elapsed time of the commit meant that rarely were multiple messages available for the connector task to get, so the connector task would use a model of MQGET(success), MQGET(fail), commit, and the elapsed time for the commit is by far the longest part of the transaction time.
By ensuring there were multiple messages available on the queue, the source connector was able to use large units of work – up to 250 messages per commit, which meant that the time spent performing the relatively long commits was minimized.
In this chart, only the behaviour of the local batch application putting the messages has changed. This has a direct impact on the source connector as the number of messages per commit increases from 2-3 to 250.
Costs of an idle connector
There may be periods when your “MQ as a source” connector is not processing any messages, and it is helpful to understand what costs might be incurred in those idle periods.
Version 1.3.2 of the “MQ as a source” connector periodically issues MQGETs on the MQ queue and these occur every 30 seconds.
This has a small cost to both the queue manager and the connector.
In terms of connector cost, we measured the cost as approximately 0.6 CPU seconds per hour when tasks.max=1 and 1.9 CPU seconds per hour with tasks.max=5.
The MQ get cost was reported as between 1 and 2 CPU microseconds for the equivalent period.
Impact of low (trickle-) rate workloads on “MQ as a Source” connector
The design of the “MQ as a Source” connector attempts to be efficient, such that initially the connector will issue an MQGET-with-wait of 30 seconds and when a message is successfully gotten, the connector immediately attempts another MQGET but this time without a wait period specified.
When messages are being put at a low rate to the queue being serviced by an “MQ as a Source” connector such that the queue depth is consistently low, there may be additional MQ gets attempted that do not result in a successful get of a message, and these are sometimes referred to as “empty gets”.
This can lead to class(3) accounting and class(5) statistics reporting 3 MQGETs per successful get. The 2 unsuccessful gets, i.e., the initial MQGET-with-wait that had no message and the MQGET following the successful get are low cost in a bindings configuration (approximately 1-2 CPU microseconds). In a client configuration these failed MQGETs will result in additional adaptor and dispatcher calls in the MQ channel initiator address space, with a more noticeable cost.
The following diagram demonstrates what MQ’s accounting class(3) and statistics class(5) traces would show when there is a slow message arrival rate:
MQ as a sink
Kafka sink connectors only support 1 connector task per partition. Typically, you may have multiple partitions, so it seems sensible to have at least the same number of connector tasks for the “MQ as a sink” connector as the number of partitions. It may even be beneficial to have larger numbers of tasks in the event of more partitions being configured.
Peak sustainable throughput rates
When using the “MQ as a sink” connector, the messages are generated by the Kafka’s “kafka-producer-perf-test.sh” shell script. The messages are consumed on z/OS by a small number of batch applications.
In our environment, the Kafka connector can put 44,996 non-persistent 1KB messages per second to the MQ queue when configured with a single connector task i.e., tasks.max=1
Adding a second connector task, using tasks.max=2, increases the sustainable workload to 61,588 messages per second, which is a 36% increase. In our environment, further connector tasks do not provide further benefit.
With regards to the cost of running the “MQ as a sink” connector in the Unix System Services (USS) environment, again a large proportion can be offset by using zIIPs.
For example, typically we observed a cost per message of 21-26 CPU microseconds in the Kafka Connector for the 1KB non-persistent messages, of which 96% was zIIP eligible, reducing the cost to between 0.5 and 1.7 CPU microseconds per message.
Additionally, the MQ class(3) accounting data reports the cost of the MQPUT1 as 6 to 8 CPU microseconds.
Note: I have not included the MQ CPU costs other than those reported by MQ’s class(3) accounting trace as this would include the cost of the batch applications that are consuming the messages from the queue.
Absolute peak rates
For absolute peak rate throughput measurements, the tests are running such that in the “MQ as a sink connector” configuration there are no applications getting messages off the MQ queue, whilst the messages are transported from the Kafka Broker to the MQ queue.
By allowing the MQ queue to fill, the workloads are designed to remove as much contention as possible and allow the connector to run at peak capacity. This does mean that the workloads are limited in size to the amount of space in the buffer pool.
Unlike the “MQ as a source” connector, the “MQ as a sink” connector configuration can benefit from delaying the message consumption on IBM MQ.
In our measurements, we saw up to a 44% improvement in the rate that the connector was able to put messages to the MQ queue, achieving a rate of 83,360 non-persistent 1KB messages per second when tasks.max=4.
Impact of message size
Once more, the primary impact of message size for non-persistent workloads is the network performance.
In our measurements we were able to achieve the following rates using 1 connector task with the “peak rate” configuration.
Message Size
|
Messages / Second
|
Approximate MB / Second
|
1KB
|
38,355
|
38
|
10KB
|
13,054
|
127
|
32KB
|
4,213
|
132
|
Impact of persistent messages
As with the source connector, the “MQ as a sink” connector performance with persistent messages will be significantly impacted by many factors including:
- What other workloads are being logged by the queue manager.
- Contention on the MQ queues.
- The I/O response times.
As the sink connector is putting messages, the amount of data written by IBM MQ’s log task will be greater than with the source connector.
With applications on z/OS running MQ gets against the same queue to ensure a low queue depth using a MQGET/commit model and the connector configured with tasks.max=1, the connector was able to put 5,000 persistent 1KB messages per second. Increasing the number of tasks resulted in the connector able to sustain 17,000 messages per second (tasks.max=4).
By increasing the number of messages per commit by the local application that was consuming the messages, there was a significant increase in the messaging rate by the sink connector. For example, with 1 connector task we saw the messaging rate increase from 5,000 to 33,767 messages put per second.
However, when the getting application uses a MQGET*100 per commit model, as the number of tasks increased, we did not see a significant improvement in throughput and indeed at 4 or more tasks saw a reduction in throughput.
The reason for this was due to increased contention on the queue – which will be discussed later in “Increased contention”.
When we removed the getting task(s) from the model, i.e., allowing the queue depths to build up, which in turn limited the how long the workload could run, we saw further improvements in the rate at which the sink connector could put messages to the queue, and due to minimising the contention on the queue, we were able to scale up to 73,000 messages per second with 5 tasks.
Increased contention
When the batch applications servicing the queue used by the sink connector were configured to use a “MQGET*100 per commit” model, the rate at which the sink connector was able to put messages peaked at 35,056 messages per second with tasks.max=2 yet decreased with additional tasks.
From investigations into the measurements, we knew that:
- The local get tasks were capable of far higher messaging rates.
- The z/OS system was neither disk nor CPU constrained based on utilisation data, i.e. the 3 CPUs were on average 75% busy but there were indications of delays due to an insufficient numbers of general purpose processors.
- The Linux on Z system hosting the Kafka cluster was neither for disk I/O nor CPU constrained.
- Increasing the number of Kafka producer tasks did not improve throughput.
However, the class(3) accounting data did reveal that as the number of connector tasks increased, there was a significant increase in contention – as indicated by much more time spent waiting for latches, particularly:
- Latch 19 – BMXL3
- Latch 31 – SMCPVT | DPSLTCH | BMXLH
- Latch 24 – LMXL1
- Latch 12 – DMCNMSPC | XMCHASH
The following chart shows the total time spent in the top 4 highest usage latches as the number of tasks was increased.
These numbers are based on the total latch time for all tasks when running the “MQ as a Sink” workloads for 3 minutes.
The chart suggests that increasing the number of connector tasks from 4 to 5 resulted in the total time spent waiting for (in order of highest latch times):
- Latch 19 (BMXL3) increased 50% from 293 seconds to 449 seconds for a workload running for 3 minutes!
- Latch 31 (SMCPVT|DPSLTCH|BMXLH) increased 63%
- · Latch 24 (LMXL1) increased 93%
- · Latch 12 (DMCNMSPC|XMCHASH) increased 160%
Does more CPUs help?
As mentioned in the previous section, the RMF report suggests that whilst the CPUs are not 100% utilized, there are delays due to insufficient CPUs being available.
By doubling the number of general-purpose CPUs from 3 to 6 and repeating the measurement where tasks.max=4, the throughput achieved increased 37% from 30,592 to 41,995 messages per second.
When the connector is putting 41,995 messages per second, the 6 CPUs are on average 47% busy, compared to the 3 CPU average of 75% at 30,592 messages per second.
In terms of what proportion of the workload had enough processors for the workload, based on RMF’s CPU report section “Distribution of in-ready work unit queue”, the 3 CPU configuration had sufficient CPUs 45% of the time, whereas the 6 CPUs had sufficient processors 65% of the time.
Additionally, we see both the total time spent waiting for latches and the average times decrease when there are additional CPUs. For example, the average latch times are 75-90% less when more CPUs are available.
The total times spent waiting for latches is shown in the following chart, based on workload running for 3 minutes:
In terms of total time spent waiting for latches, latch 19 on the 6 CPU configuration spends 10% less time but makes many more requests of that type. The remaining 3 “hot” latches, 12, 24 and 31 spend 40 to 70% less time waiting for the latch.
So, additional CPUs can help but with many sink connector tasks putting to a single queue whilst messages are being consumed from the same queue, there is a degree of contention on the queue, buffer pool etc.
Costs of an idle connector
There may be periods when your “MQ as a sink” connector is not processing any messages, and it is helpful to understand what costs might be incurred in those idle periods.
When idle, the “MQ as a sink” connector does not interact with the MQ queue manager, so the cost is entirely in the connector.
In terms of connector cost, we measured the cost as approximately 8.2 CPU seconds per hour when tasks.max=1 and 9.9 CPU seconds per hour with tasks.max=5.
Impact of shared queue on “MQ as a Sink” connector
The “MQ as a Sink” Kafka connector uses MQPUT1 to put the messages to the queue. This can be impacted by first-open / last-close effects as discussed in the performance report MP16's section "Frequent opening of shared queues".
Summary
Hopefully this report gives an indication of the performance that can be achieved using the Kafka Connectors on modern z hardware with low-latency networks and responsive disk storage subsystems.
On a busier system or less performant network, it would be prudent to work with lower expectations, as well as running benchmarks for your own environment.