MQ

 View Only

 SYSTEM.CHANNEL.INITQ filled up

Jim Creasman's profile image
Jim Creasman IBM Champion posted Fri July 11, 2025 08:59 AM

I'm researching an issue that happened yesterday with one of our MQ uniform clusters.  The cluster in question has two qmgrs, each running on separate VMs and identically configured (except for the name).  These are set up so that messages sent to a queue on one are evenly balanced across the cluster via the qmgr-to-qmgr channel.  A problem was reported by our monitoring software that it was unable to connect to one of the qmgrs in the cluster.

As I investigated I noticed two conditions were happening on the qmgr reporting a problem:

  1. The SYSTEM.CHANNEL.INITQ queue was full.  It has the default maxdepth of 1000 set.
  2. The SYSTEM.CLUSTER.TRANSMIT.QUEUE queue had over 28k messages.

There were also AMQ7208 messages in the log, indicating the qmgr was unable to start the cluster channel.  I did not see any issue with the file systems used by MQ.  The space utilization was slightly elevated, but I attributed this to the queue having 28k messages.

I resolved the problem by temporarily increasing the maxdepth of the SYSTEM.CHANNEL.INITQ to 2000 and then redeploying the cluster.  MQ runs in Kubernetes as stateful sets.  A redeployment does a rolling restart as first one, then the other server is stopped and restarted.  This gives the clients the opportunity to stay connected to the qmgr that is running.

When the system came back up I inspected the queues and noticed that the SYSTEM.CHANNEL.INITQ queue now had a depth of zero and the SYSTEM.CLUSTER.TRANSMIT.QUEUE queue depth was decreasing.  Eventually, this depth reached zero also. 

My questions to the MQ community are these:

  • What would typically cause this sort of issue?  As I mentioned we have three separate MQ uniform clusters, each having two qmgrs.  All were upgraded to MQ 9.4.2 about a month ago.  In the 3-4 years we have had these clusters running in production this is the first time I have seen this problem.
  • Were my actions to resolve this the correct approach?  I'm not aware of any messages lost and the system appears to be running normally today.  I'm taking the win, but it always nags me a bit if I don't know why something broke.
  • Anyone else had similar issues?  I did a quick search through this forum, but didn't see anything matching.

Thanks,

Jim

om prakash's profile image
om prakash IBM Champion

SYSTEM.CHANNEL.INITQ is gets full only if there are too many connection open; and MQ would not open any more threads. Look to be a network issue causing no more connections; and SCTQ holding up messages. Restarting the queue manager or re-deploy action taken would have resolved the network issue noticed.

Increasing the queue depth is a correct action; but the root cause was on the network.

You would have stopped the listener to see if it resolved.

Morag Hughson's profile image
Morag Hughson IBM Champion

The INITQ is where a message is placed to tell the CHINIT to start a channel. There would appear to have been lots of attempts to ask the CHINIT to start the channel and it could not, and in the mean time there were lots of messages piling up on the transmission queue that this channel was supposed to be moving to another system.

Your say that you had lots of AMQ7208 messages reporting that the channel was unable to start, but you do not tell us the contents of those error messages. I believe they will contain your root cause. Looking at the description of that error message I see that it is supposed to say this:-

MESSAGE:
The queue manager failed to start cluster channel '<insert one>'.

EXPLANATION:
A message was put that required the cluster sender channel '<insert one>' to be
automatically started to the cluster queue manager '<insert two>'. The queue
manager failed to start the cluster sender channel for reason <insert three>.
This may be due to a problem with the SYSTEM.CHANNEL.INITQ or a change to the
cluster queue manager that the channel represents.

ACTION:
This may be a transitory problem. Investigate the problem and if necessary
start the channel manually.

So, insert three would seem to be the interesting one here - to understand the reason why the channel was unable to be started. Could you tell us what is says there?

Cheers,
Morag