We recently ran across this issue and I thought the approach here might be helpful for other MQ administrators that run across this type of issue.
Background:
We have a Q1 queue that is deployed on server node1 with queue manager QM1, server node2 with QM2, and server node3 with QM3. There is a queue drainer that runs on each server and drains the messages from Q1. Remote queue managers put messages roughly evenly across the 3 nodes for Q1. This is deployed in a cloud environment. When this queue drainer app reads the message, it also needs to write data to a NFS location. We have found that there can be a 5+ hour window where the NFS performance significantly degrades to where it can take minutes for the queue drainer to process a single message. Perhaps NFS quotas are being hit by the cloud provider, but that needs to be further investigated.
Solution to help alleviate the problem:
In order to help alleviate the problem, a Linux script (listed below) runs every 30 minutes on a server that has client connectivity to these three nodes and adjusts the CLWLPRTY, if a poor performing node is detected. The ideal state is for Q1 to be at CLWLPRTY(1). If it is found that a queue has messages that are over 10 minutes old, it is demoted to CLWLPRTY(0). Any queue that has its oldest message at 10 minutes or less is also promoted to CLWLPRTY(1), if applicable. We recently had this queue drainer issue happen, and this approach helped minimize how old the messages got on the affected Q1.
Suggestion to IBM MQ Development Team:
I know this is a more targeted approach to a specific application performance issue, but being able to implement these types of concepts (demoting and promoting poor performing nodes) into the cluster workload routing algorithm can go a long way to help IBM MQ run smoother in a cluster environment.
Script:
#!/bin/bash
MSGAGE1=`echo "dis qs(Q1) type(queue)" | runmqsc -c QM1 | tr ' ' '\n' | grep MSGAGE | cut -d'(' -f2 | cut -d')' -f1`
MSGAGE2=`echo "dis qs(Q1) type(queue)" | runmqsc -c QM2 | tr ' ' '\n' | grep MSGAGE | cut -d'(' -f2 | cut -d')' -f1`
MSGAGE3=`echo "dis qs(Q1) type(queue)" | runmqsc -c QM3 | tr ' ' '\n' | grep MSGAGE | cut -d'(' -f2 | cut -d')' -f1`
# If a node has a msgage less than or equal to 10 minutes, make sure CLWLPRTY is set to 1
if [[ ! -z "$MSGAGE1" && "$MSGAGE1" -le 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM1 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 1 ]]; then
echo "alter ql(Q1) clwlprty(1)" | runmqsc -c QM1
UPDATE=Y
fi
fi
if [[ ! -z "$MSGAGE2" && "$MSGAGE2" -le 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM2 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 1 ]]; then
echo "alter ql(Q1) clwlprty(1)" | runmqsc -c QM2
UPDATE=Y
fi
fi
if [[ ! -z "$MSGAGE3" && "$MSGAGE3" -le 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM3 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 1 ]]; then
echo "alter ql(Q1) clwlprty(1)" | runmqsc -c QM3
UPDATE=Y
fi
fi
# If a node has a msgage greater than 10 minutes, make sure CLWLPRTY is set to 0
if [[ ! -z "$MSGAGE1" && "$MSGAGE1" -gt 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM1 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 0 ]]; then
echo "alter ql(Q1) clwlprty(0)" | runmqsc -c QM1
UPDATE=Y
fi
fi
if [[ ! -z "$MSGAGE2" && "$MSGAGE2" -gt 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM2 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 0 ]]; then
echo "alter ql(Q1) clwlprty(0)" | runmqsc -c QM2
UPDATE=Y
fi
fi
if [[ ! -z "$MSGAGE3" && "$MSGAGE3" -gt 600 ]]; then
CLWLPRTY=`echo "dis ql(Q1) CLWLPRTY" | runmqsc -c QM3 | tr ' ' '\n' | grep "CLWLPRTY(" | cut -d'(' -f2 | cut -d')' -f1`
if [[ ! -z "$CLWLPRTY" && "$CLWLPRTY" -ne 0 ]]; then
echo "alter ql(Q1) clwlprty(0)" | runmqsc -c QM3
UPDATE=Y
fi
fi
# send email to team if there was a clwprty update
if [ $UPDATE == Y ]; then
process to send email to MQ team
fi