This is an important and interesting topic.
When setting up new integrations with MQ as messaging backbone, how many of you are stringent in specs ?
Do your customers know the volumes they are sending ?
Do they know message sizes ?
Periodicity (Batch or time critical )?
If so, how do you capture and document these details ?
Important information to be able to have an “always on “ environment of course.
We have “old” settings on queuedepth , by old inheritance . These are discovered along the way when migrating to new MQ Versions and so forth.
It is a good idea as you mention Glen, to capture “badly behaving” flows on isolated queue managers.
From: Glen Brumbaugh <
wsmqfam-ws@lists.imwuc.org>
Sent: Wednesday, April 25, 2018 03:25
To:
WSMQFam-ws@lists.imwuc.orgSubject: [WSMQFam-ws] - Max Queue Depth: Are You at Risk?
I witnessed yet another Production Outage this week caused by an Application reaching the Maximum Queue Depth for a critical queue. As you're probably aware, the default Maximum Queue Depth on the SYSTEM.DEFAULT.LOCAL.QUEUE is 5,000 messages. Many defined queues inherit their default depth setting from this queue's properties. Is this really the depth you want?
For 25 years now, I have warned about this setting. Part of any asynchronous messaging capability must be the ability to buffer incoming messages when they can neither be consumed fast enough nor are simply not being consumed because the reading application is down. The system should be designed for reliability and resilience, not for the "happy path". Consider the downside for setting this value to its maximum setting. The available disk size could be allowed to fill up, IF REQUIRED TO BY TRAFFIC VOLUMES. The alternative is to BREAK THE BUSINESS APPLICATION DUE TO YOUR ADMINISTRATIVE SETTINGS. Yes, that's right, you broke Production. Not the Application. Not the root cause of the backup. You. The administrator.
Your only defense against this disaster is monitoring. Monitoring every queue (including SYSTEM queues). And actually paying attention to alerts. Even the false ones. 7 x 24 x 365. Now for some perspective. Disk has come a long ways since MQ was designed and introduced. If you are constrained by disk space, then monitor disk space. We generally have much better server monitoring than middleware monitoring anyway. If you need to isolate apps from poorly behaving "run-away" apps, then isolate them. On separate servers. Problem solved. If the app wants to use up all of the space and crash then at least the business application (through their IT arm) was responsible.
If MQ causes an application to crash and take a production outage when there was plenty of space available, how do you justify that action? There really is no good reason to take a production outage in that instance. Especially when there may have been 10s to 100s of GB available. The only real remaining reason, and it's not a particularly good one, to be still using these settings is the queue full percentage monitoring. There are far better ways to accomplish this monitoring now, but I'm sure there are many legacy monitoring configurations based upon using the depth setting. If that's your case, then revisit the depth you're using.
Times have changed. Have you? Check and evaluate your settings. They may still be appropriate, they may never have been appropriate, or they may need to be changed. It's spring. Perform a MQ tuneup.
Regards,
Glen Brumbaugh
-----End Original Message-----
This e-mail is confidential and it is intended only for the addressees. Any review, dissemination, distribution, or copying of this message by persons or entities other than the intended recipient is prohibited. If you have received this e-mail in error, kindly notify us immediately by telephone or e-mail and delete the message from your system. The sender does not accept liability for any errors or omissions in the contents of this message which may arise as a result of the e-mail transmission.