Thats good news, nearly all of your non-persistent messaging is outside syncpoint, and nearly all of the persistent messaging is inside syncpoint.
I'd suggest the next step is to collect this sort of doc when the problem is actually happening, something along the following lines:
The amqsrua data should be collected in the by a number of concurrent instances of amqsrua running in the background. You probably want to create a little script to collect all this data (and to test it beforehand).
The throughput is small enough that the issue would probably be visible in a short MQ trace, but interpretting the trace is quite a skilled task and is not something the average customer would be able to do. The fact that the (undocumented) batch size was explicitly set suggests IBM support have already been involved in the past, do you know what other conclusions and observations were made at that time? (for example, was a trace taken and analyzed then).
Original Message:
Sent: Wed March 08, 2023 03:20 PM
From: Earle Ake
Subject: Stream queue tuning for best performance
Good memory!
QMGRSTATS[Interval0]
getCountNT[NotPersistent] 187
getCountT[Persistent] 33933
QMGRSTATS[Interval1]
getCountNT[NotPersistent] 180535501
getCountT[NotPersistent] 128
getCountNT[Persistent] 2328
getCountT[Persistent] 721459869
------------------------------
Earle Ake
Original Message:
Sent: Wed March 08, 2023 12:42 PM
From: Andrew Hickson
Subject: Stream queue tuning for best performance
Earle,
The reported IO latency is around 1ms, that's not great, but combined with the batch size of 50 shoudn't be too much cause for concern. You haven't reported the log write sizes so it's impossible to even guess how much batching might be occurring.
You've only reported the GET stats on this queue, however I asked for both PUT and GET stats because lock contention is only reported on PUT (I had to put it someplace and PUT seemed as good as anywhere!). As you haven't included tht data I can't comment on that either. The GET rates look very low and the issuance of 2033's would typically indicate the queues were being kept empty, but that would depend upon the GET's asking for ANY message, rather than a specific message.
In later MQ releases there's another way of seeing exactly how much persistent message activity is occurring outside of syncpoint (which is a common cause of performace issues). As before, I no longer have access to the source code or it's history and so can't say exactly when this was added, but I'd guess sometime early in V9. There's an MQ service utility called amqldmpa, which can be used to dump internal MQ control blocks. If you issue the right incantation of amqldmpa it will dump the block containing the highest level of MQI counters. These are the counters that amqsrua uses to report various stats. Some of these stats are actually more detailed than the stats reported by amqsrua, and the queue manager code adds various sub-counters together to produce the amalgamations reported by amqsrua. In particular the put and get count are collected internally for both transactional (T) and non-transactional (NT) but are reported as a single total.
I think (IIRC) the correct incantation is
amqldmpa -m <QMGR_NAME> -c K -d 1 -O 8 -f/var/mqm/errors/kern.txt
but as this is from memory I'd very strongly suggest you try this on a test system first.
The -f option indicates the location where the output will be APPENDED, so you need to consider when to delete this file. The location specified by the -f option must be writable by the mqm user.
The counters are collected in an array of structures and so you need to look at the correct set of totals, these will be marked (IIRC!) "Interval0" and "Interval1", for example "getCountNT" would show messages got outside syncpoint and "getCountT" messages got inside syncpoint. You don't want to see lots of persistent messages being put and got outside syncpoint.
------------------------------
Andrew Hickson
Original Message:
Sent: Wed March 08, 2023 09:06 AM
From: Earle Ake
Subject: Stream queue tuning for best performance
I have been collecting stats every 30 minutes and I run the amqsrua for a full minute of statistics. Here is what I have seen for example when it is happening:
Publication received PutDate:20230307 PutTime:17000130 Interval:3 minutes,49.300 seconds
Log - write latency 782 uSec
Publication received PutDate:20230307 PutTime:17001130 Interval:10.000 seconds
Log - write latency 905 uSec
Publication received PutDate:20230307 PutTime:17002130 Interval:10.003 seconds
Log - write latency 1192 uSec
Publication received PutDate:20230307 PutTime:17003130 Interval:10.000 seconds
Log - write latency 755 uSec
Publication received PutDate:20230307 PutTime:17004130 Interval:10.000 seconds
Log - write latency 879 uSec
Publication received PutDate:20230307 PutTime:17005130 Interval:10.000 seconds
Log - write latency 1023 uSec
Publication received PutDate:20230307 PutTime:17010130 Interval:10.000 seconds
Log - write latency 839 uSec
Publication received PutDate:20230307 PutTime:17000131 Interval:5 days,10 hours,58 minutes,13.680 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 23879536 51/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 100987938483 214187/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 762895 2/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 13969
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 748926 2/sec
Publication received PutDate:20230307 PutTime:17000131 Interval:0.000 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 0
Publication received PutDate:20230307 PutTime:17001131 Interval:10.002 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 987 99/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 4257026 425617/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 28 3/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 28 3/sec
Publication received PutDate:20230307 PutTime:17002131 Interval:10.002 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 552 55/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 2382139 238155/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 13 1/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 13 1/sec
Publication received PutDate:20230307 PutTime:17003131 Interval:10.003 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 600 60/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 2571308 257051/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 22 2/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 22 2/sec
Publication received PutDate:20230307 PutTime:17004132 Interval:10.002 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 501 50/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 2116871 211634/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 17 2/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 17 2/sec
Publication received PutDate:20230307 PutTime:17005132 Interval:10.002 seconds
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent message count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent message count 467 47/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET non-persistent byte count 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET persistent byte count 1948672 194825/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails 13 1/sec
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_NO_MSG_AVAILABLE 0
RETAIL.DATA.POS.KLOG.00.EAST.STREAM destructive MQGET fails with MQRC_TRUNCATED_MSG_FAILED 13 1/sec
------------------------------
Earle Ake
Original Message:
Sent: Fri March 03, 2023 09:30 AM
From: Andrew Hickson
Subject: Stream queue tuning for best performance
788uSec isn't an awful response time, and coupled with the batching in amqfcxba should be OK. As stated previously, if the published messages were then consumed (by user apps) outside syncpoint then this could cause concurrency issues that would NOT be fixed by the batching. The subset of the amqsrua data you pasted related to MQPUT did not include anything for "lock contention", was a value reported ?
Your earlier put/get stats suggested that you've got messages averaging around 4KB, and the write size reported in the logger stats (approx 9KB) suggests very limited concurrency (the message data is ALL logged on an MQPUT, but the MQGET log record is very small and contains no message data) at the time this data was collected.
Collecting a series of amqsrua data (-n) parameter at the time the error is apparent might be helpful, the data for the specific stream queues and subscriber queues for PUT/GET and the recovery log stats would likely be the most illuminating.
------------------------------
Andrew Hickson
Original Message:
Sent: Fri March 03, 2023 08:39 AM
From: Earle Ake
Subject: Stream queue tuning for best performance
At 08:30, I see this as a reference:
Publication received PutDate:20230303 PutTime:13311144 Interval:10.000 seconds
Log - bytes in use 4026531840
Log - bytes max 4445962240
Log file system - bytes in use 7695695872
Log file system - bytes max 225485783040
Log - physical bytes written 86970368 8696219/sec
Log - logical bytes written 69452914 6944639/sec
Log - write latency 788 uSec
Log - write size 9204
Log - current primary space in use 8.13%
Log - workload primary space utilization 10.45%
Publication received PutDate:20230303 PutTime:13300141 Interval:4 minutes,43.164 seconds
User CPU time percentage 8.99%
System CPU time percentage 11.25%
CPU load - one minute average 4.85
CPU load - five minute average 4.55
CPU load - fifteen minute average 4.29
RAM free percentage 85.50%
RAM total bytes 32011MB
Publication received PutDate:20230303 PutTime:13300142 Interval:4 minutes,43.164 seconds
User CPU time - percentage estimate for queue manager 2.45%
System CPU time - percentage estimate for queue manager 1.93%
RAM total bytes - estimate for queue manager 3321MB
------------------------------
Earle Ake
Original Message:
Sent: Fri March 03, 2023 07:38 AM
From: Andrew Hickson
Subject: Stream queue tuning for best performance
The GET rates you are quoting are tiny, you'd expect there to have to be a major underlying issue for MQ to be limited to these rates.
With persistent messaging the IO latency to the log is typically crucial. did you run amqsrua to query this information. A few microseconds would be a good IO latency.
The get from the stream queue would be by the amqfcxba process and there are message ordering constrints that make it difficult to have multiple getting threads per stream (JMS requires that all messages from one publisher to one subscriber are delivered in order). However amqfcxba should batch messages in the event that a backlog is building on the stream queue, thus avoiding the need to run multiple publisher threads per stream. IIRC the batching was more sophisticated in the original queued pubsub implementation (pre V7) but the version that was implemented in V7 was quite basic.
I can't recall if the performance enhancement to a simple MQGET of a persistent message message was in 9.2.0.6 (I suspect not as it was one of the later things I did at IBM and I left around 9.2.2). Getting a persistent message outside syncpoint make no sense, if the application fails between the MQGET and processing the message then the message is effectively lost and so didn't need to be persistent in the first place. MQ was therefore heavily optimized for persistent messaging INSIDE syncpoint. We had a number of cases where MQ was portrayed in a bad light by trivial benchmarks exploiting this "weakness" and so it was eventually tidied up. If you are consuming the published messages from their subscription queues outside of synpoint then this could potentially cause some serialization issues, and thus performance problems, as the amqfcxba publish threads would content with your threads for access to the subscription queues.
The more serialization issues, the more dependant you become on low IO latency, I'd suggest that using amqsrua to report the IO latency being observed on the recovery log should be your next step.
------------------------------
Andrew Hickson
Original Message:
Sent: Thu March 02, 2023 04:25 PM
From: Earle Ake
Subject: Stream queue tuning for best performance
I have been running amqsrua every 30 minutes for both GET and PUT for count=10. Earlier today, I was seeing:
MQPUT/MQPUT1 count 756 76/sec
MQPUT byte count 3259822 325970/sec
MQPUT persistent message count 756 76/sec
MQGET count 756 76/sec
MQGET byte count 3258041 325792/sec
destructive MQGET persistent message count 756 76/sec
average queue time 14382 uSec
Now when the queue is backing up, I see:
MQPUT/MQPUT1 count 1344 134/sec
MQPUT byte count 5570307 557019/sec
MQPUT persistent message count 1344 134/sec
MQGET count 506 51/sec
MQGET byte count 2263554 226351/sec
destructive MQGET persistent message count 506 51/sec
average queue time 4920807386 uSec
I am uploading the Grafana graphs for today from 08:00 - 16:00 The PUT and GET counts.
------------------------------
Earle Ake
Original Message:
Sent: Thu March 02, 2023 07:21 AM
From: Andrew Hickson
Subject: Stream queue tuning for best performance
There are (or at least there used to be!) tuning parameters that allow the number of threads per amqfcxba process to be configured, but this is highly unlikely to provide any benefit.
I'm guessing from your description that your amqfcxba process appears to be using 100% of one core, but that isn't clear from your description. This could suggest that some lock is highly contested, but nearly all of the remaining 'coarse' locks in MQ are inter process locks rather than intra process locks and in that case spreading the load across multiple processes won't make any difference.
You appear to be able to identify 9 queues of 'interest', the amqsrua sample should allow you to get some idea of how each of these queues are performing and in particular how much lock contention is occuring for the primary lock associated with each queue.
There is also an internal IBM utility which produces a list of MQ inter process locks and the activity, including contention, on each lock. You're probably best to work with IBM support if you need to go to this level of detail as knowing what the locks are used for and thus how to interpret contention on a lock isn't always easy.
As you mention IOWAIT I'm assuming you are dealing with persistent messages. In a well tuned system, nearly all of the IO should be against the recovery log, and MQ does it's own buffer management for this IO. While one IO to the log is pending, the buffer for the next IO is being prepared. As soon as the first IO completes the second IO is eligible to be schedued (either because some thread has asked for the IO to be completed, or because a threshold of outstanding IO bytes is passed). This mechanism can make the IOWAIT times appear rather odd and you have to be very careful about reading too much into reported IOWAIT times.
Do you have the recovery log and the queue manager data directories mounted on different file systems? The primary reason for doing this is so that you can look at the IO rates independantly. Once the queues start to become deep then IO will start to spill to the queue manager data file system, this IO should be largely unforced (no IOWAIT), but once sufficient real IO is outstanding the system may start to flush these buffers to disk.
The amqsrua sample can also be used to look at statistics showing how the receovery log is performing, in particular you can see the average write latency and the average write sizes.
Have you changed anything recently in your configuation ?
Was this system running OK up until recently and if so what changed to lead to the issue ?
------------------------------
Andrew Hickson
Original Message:
Sent: Wed March 01, 2023 02:12 PM
From: Earle Ake
Subject: Stream queue tuning for best performance
Have a RHEL 7.9 pair of servers running multi-instance MQ V9.2.0.6. Have a set of 9 stream queues defined. The PUT and GET rates match until around 11AM when the GET rate starts to spike up and down and drops. The PUT rate is still going up so the queues get backed up trying to republish. Have tracked the queues down to using the same PID and different TIDs for each queue.
This is a 12 core server. It looks like the one process is CPU bound with all the attached threads.
Is there a way to distribute the threads across the cores or is there a way to specify each amqfcxba process has only 2 threads? Would that allow each new amqfcxba process to load balance across cores? Looking for better performance. The current amqfcxba process has 27 threads.
Have used Dynatrace and the Wait I/O is around 11%.
------------------------------
Earle Ake
------------------------------