Stream queue tuning for best performance

7. RE: Stream queue tuning for best performance

Like

Earle Ake

Posted Fri March 03, 2023 08:18 AM

We do have these in the qm.ini:

Broker:
SyncPointIfPersistent=yes
PublishBatchSize=50

I added collection of the DISK Log statistics as well. Once it acts up around lunch, will see what it says.

I looked at the 9.2.0.7 release notes and did not see anything in there about MQGET enhancements.

Thanks for your help so far Andrew!

------------------------------
Earle Ake
------------------------------

Original Message

Original Message:
Sent: Fri March 03, 2023 07:38 AM
From: Andrew Hickson
Subject: Stream queue tuning for best performance

The GET rates you are quoting are tiny, you'd expect there to have to be a major underlying issue for MQ to be limited to these rates.

With persistent messaging the IO latency to the log is typically crucial. did you run amqsrua to query this information. A few microseconds would be a good IO latency.

The get from the stream queue would be by the amqfcxba process and there are message ordering constrints that make it difficult to have multiple getting threads per stream (JMS requires that all messages from one publisher to one subscriber are delivered in order). However amqfcxba should batch messages in the event that a backlog is building on the stream queue, thus avoiding the need to run multiple publisher threads per stream. IIRC the batching was more sophisticated in the original queued pubsub implementation (pre V7) but the version that was implemented in V7 was quite basic.

I can't recall if the performance enhancement to a simple MQGET of a persistent message message was in 9.2.0.6 (I suspect not as it was one of the later things I did at IBM and I left around 9.2.2). Getting a persistent message outside syncpoint make no sense, if the application fails between the MQGET and processing the message then the message is effectively lost and so didn't need to be persistent in the first place. MQ was therefore heavily optimized for persistent messaging INSIDE syncpoint. We had a number of cases where MQ was portrayed in a bad light by trivial benchmarks exploiting this "weakness" and so it was eventually tidied up. If you are consuming the published messages from their subscription queues outside of synpoint then this could potentially cause some serialization issues, and thus performance problems, as the amqfcxba publish threads would content with your threads for access to the subscription queues.

The more serialization issues, the more dependant you become on low IO latency, I'd suggest that using amqsrua to report the IO latency being observed on the recovery log should be your next step.

------------------------------
Andrew Hickson

Original Message:
Sent: Thu March 02, 2023 04:25 PM
From: Earle Ake
Subject: Stream queue tuning for best performance

I have been running amqsrua every 30 minutes for both GET and PUT for count=10. Earlier today, I was seeing:
MQPUT/MQPUT1 count 756 76/sec
MQPUT byte count 3259822 325970/sec
MQPUT persistent message count 756 76/sec
MQGET count 756 76/sec
MQGET byte count 3258041 325792/sec
destructive MQGET persistent message count 756 76/sec
average queue time 14382 uSec

Now when the queue is backing up, I see:
MQPUT/MQPUT1 count 1344 134/sec
MQPUT byte count 5570307 557019/sec
MQPUT persistent message count 1344 134/sec
MQGET count 506 51/sec
MQGET byte count 2263554 226351/sec
destructive MQGET persistent message count 506 51/sec
average queue time 4920807386 uSec

I am uploading the Grafana graphs for today from 08:00 - 16:00 The PUT and GET counts.

------------------------------
Earle Ake

Original Message:
Sent: Thu March 02, 2023 07:21 AM
From: Andrew Hickson
Subject: Stream queue tuning for best performance

There are (or at least there used to be!) tuning parameters that allow the number of threads per amqfcxba process to be configured, but this is highly unlikely to provide any benefit.

I'm guessing from your description that your amqfcxba process appears to be using 100% of one core, but that isn't clear from your description. This could suggest that some lock is highly contested, but nearly all of the remaining 'coarse' locks in MQ are inter process locks rather than intra process locks and in that case spreading the load across multiple processes won't make any difference.

You appear to be able to identify 9 queues of 'interest', the amqsrua sample should allow you to get some idea of how each of these queues are performing and in particular how much lock contention is occuring for the primary lock associated with each queue.

There is also an internal IBM utility which produces a list of MQ inter process locks and the activity, including contention, on each lock. You're probably best to work with IBM support if you need to go to this level of detail as knowing what the locks are used for and thus how to interpret contention on a lock isn't always easy.

As you mention IOWAIT I'm assuming you are dealing with persistent messages. In a well tuned system, nearly all of the IO should be against the recovery log, and MQ does it's own buffer management for this IO. While one IO to the log is pending, the buffer for the next IO is being prepared. As soon as the first IO completes the second IO is eligible to be schedued (either because some thread has asked for the IO to be completed, or because a threshold of outstanding IO bytes is passed). This mechanism can make the IOWAIT times appear rather odd and you have to be very careful about reading too much into reported IOWAIT times.

Do you have the recovery log and the queue manager data directories mounted on different file systems? The primary reason for doing this is so that you can look at the IO rates independantly. Once the queues start to become deep then IO will start to spill to the queue manager data file system, this IO should be largely unforced (no IOWAIT), but once sufficient real IO is outstanding the system may start to flush these buffers to disk.

The amqsrua sample can also be used to look at statistics showing how the receovery log is performing, in particular you can see the average write latency and the average write sizes.

Have you changed anything recently in your configuation ?

Was this system running OK up until recently and if so what changed to lead to the issue ?

------------------------------
Andrew Hickson

Original Message:
Sent: Wed March 01, 2023 02:12 PM
From: Earle Ake
Subject: Stream queue tuning for best performance

Have a RHEL 7.9 pair of servers running multi-instance MQ V9.2.0.6. Have a set of 9 stream queues defined. The PUT and GET rates match until around 11AM when the GET rate starts to spike up and down and drops. The PUT rate is still going up so the queues get backed up trying to republish. Have tracked the queues down to using the same PID and different TIDs for each queue.

This is a 12 core server. It looks like the one process is CPU bound with all the attached threads.

Is there a way to distribute the threads across the cores or is there a way to specify each amqfcxba process has only 2 threads? Would that allow each new amqfcxba process to load balance across cores? Looking for better performance. The current amqfcxba process has 27 threads.

Have used Dynatrace and the Wait I/O is around 11%.

------------------------------
Earle Ake
------------------------------

8. RE: Stream queue tuning for best performance

Like

Andrew Hickson

Posted Fri March 03, 2023 09:11 AM

The observation that a non default publish batch size has been set suggests that this is not the first time this issue has been investigated.

Do you know when this parameter was set, and who recommended it to be set ?

I no longer have access to any IBM resources, but I have a recollection there was an issue with PublishBatchSize needing to be reinstated through IBM support. I'm afraid I can't put any timescale on when this happened, but perhaps someone from IBM might chip in ?

------------------------------
Andrew Hickson
------------------------------