MQ

 View Only

 SYSTEM.CHANNEL.INITQ filled up

Jim Creasman's profile image
Jim Creasman IBM Champion posted Fri July 11, 2025 08:59 AM

I'm researching an issue that happened yesterday with one of our MQ uniform clusters.  The cluster in question has two qmgrs, each running on separate VMs and identically configured (except for the name).  These are set up so that messages sent to a queue on one are evenly balanced across the cluster via the qmgr-to-qmgr channel.  A problem was reported by our monitoring software that it was unable to connect to one of the qmgrs in the cluster.

As I investigated I noticed two conditions were happening on the qmgr reporting a problem:

  1. The SYSTEM.CHANNEL.INITQ queue was full.  It has the default maxdepth of 1000 set.
  2. The SYSTEM.CLUSTER.TRANSMIT.QUEUE queue had over 28k messages.

There were also AMQ7208 messages in the log, indicating the qmgr was unable to start the cluster channel.  I did not see any issue with the file systems used by MQ.  The space utilization was slightly elevated, but I attributed this to the queue having 28k messages.

I resolved the problem by temporarily increasing the maxdepth of the SYSTEM.CHANNEL.INITQ to 2000 and then redeploying the cluster.  MQ runs in Kubernetes as stateful sets.  A redeployment does a rolling restart as first one, then the other server is stopped and restarted.  This gives the clients the opportunity to stay connected to the qmgr that is running.

When the system came back up I inspected the queues and noticed that the SYSTEM.CHANNEL.INITQ queue now had a depth of zero and the SYSTEM.CLUSTER.TRANSMIT.QUEUE queue depth was decreasing.  Eventually, this depth reached zero also. 

My questions to the MQ community are these:

  • What would typically cause this sort of issue?  As I mentioned we have three separate MQ uniform clusters, each having two qmgrs.  All were upgraded to MQ 9.4.2 about a month ago.  In the 3-4 years we have had these clusters running in production this is the first time I have seen this problem.
  • Were my actions to resolve this the correct approach?  I'm not aware of any messages lost and the system appears to be running normally today.  I'm taking the win, but it always nags me a bit if I don't know why something broke.
  • Anyone else had similar issues?  I did a quick search through this forum, but didn't see anything matching.

Thanks,

Jim

om prakash's profile image
om prakash IBM Champion

SYSTEM.CHANNEL.INITQ is gets full only if there are too many connection open; and MQ would not open any more threads. Look to be a network issue causing no more connections; and SCTQ holding up messages. Restarting the queue manager or re-deploy action taken would have resolved the network issue noticed.

Increasing the queue depth is a correct action; but the root cause was on the network.

You would have stopped the listener to see if it resolved.

Morag Hughson's profile image
Morag Hughson IBM Champion

The INITQ is where a message is placed to tell the CHINIT to start a channel. There would appear to have been lots of attempts to ask the CHINIT to start the channel and it could not, and in the mean time there were lots of messages piling up on the transmission queue that this channel was supposed to be moving to another system.

Your say that you had lots of AMQ7208 messages reporting that the channel was unable to start, but you do not tell us the contents of those error messages. I believe they will contain your root cause. Looking at the description of that error message I see that it is supposed to say this:-

MESSAGE:
The queue manager failed to start cluster channel '<insert one>'.

EXPLANATION:
A message was put that required the cluster sender channel '<insert one>' to be
automatically started to the cluster queue manager '<insert two>'. The queue
manager failed to start the cluster sender channel for reason <insert three>.
This may be due to a problem with the SYSTEM.CHANNEL.INITQ or a change to the
cluster queue manager that the channel represents.

ACTION:
This may be a transitory problem. Investigate the problem and if necessary
start the channel manually.

So, insert three would seem to be the interesting one here - to understand the reason why the channel was unable to be started. Could you tell us what is says there?

Cheers,
Morag

Francois Brandelik's profile image
Francois Brandelik IBM Champion

Seeing a backup on the SCTQ is I believe the main hint.

This may or may not relate to network problems. But it is sure to occur if you have messages that have a full queue as destination. That full queue may have nothing to do with your application but by the nature of the channel retry protocol it will slow down the channel delivery considerably (each message being retried x times before delivery to DLQ or Backout Q or even potentially stopping the channel if no DLQ or Backout Q). The channel definition (receiver) should tell you if the use of the DLQ is allowed.

Difficult to see after your refresh in Kubernetes, but checking for a full queue is what I would try first. Hope it helps

Jim Creasman's profile image
Jim Creasman IBM Champion

Thanks, all, for your input.  The AMQ7208 warning messages in log were coming in two flavors.  One lists the remote qmgr (QC01) and the other has blanks for the qmgr.  In both cases MQRC_Q_FULL is given as the reason.

----- amqcccxa.c : 2671 -------------------------------------------------------
07/10/25 18:25:46 - Process(2117284.1295) User(mqm) Program(amqzlaa0)
                    Host(mqprod-prospect-core-server-msg-0) Installation(Installation1)
                    VRMF(9.4.2.0) QMgr(QC00)
                    Time(2025-07-10T18:25:46.052Z)
                    RemoteHost(127.0.0.6)
                    CommentInsert1(UC_PROSPECT_QC01)
                    CommentInsert2(QC01)
                    CommentInsert3(MQRC_Q_FULL)

AMQ7208W: The queue manager failed to start cluster channel 'UC_PROSPECT_QC01'.

EXPLANATION:
A message was put that required the cluster sender channel 'UC_PROSPECT_QC01'
to be automatically started to the cluster queue manager 'QC01'. The queue
manager failed to start the cluster sender channel for reason MQRC_Q_FULL. This
may be due to a problem with the SYSTEM.CHANNEL.INITQ or a change to the
cluster queue manager that the channel represents.
ACTION:
This may be a transitory problem. Investigate the problem and if necessary
start the channel manually.

----- amqkfnca.c : 5338 -------------------------------------------------------
07/10/25 18:25:46 - Process(2117284.1295) User(mqm) Program(amqzlaa0)
                    Host(mqprod-prospect-core-server-msg-0) Installation(Installation1)
                    VRMF(9.4.2.0) QMgr(QC00)
                    Time(2025-07-10T18:25:46.094Z)
                    RemoteHost(127.0.0.6)
                    CommentInsert1(UC_PROSPECT_QC01)
                    CommentInsert2(    )
                    CommentInsert3(MQRC_Q_FULL)

AMQ7208W: The queue manager failed to start cluster channel 'UC_PROSPECT_QC01'.

EXPLANATION:
A message was put that required the cluster sender channel 'UC_PROSPECT_QC01'
to be automatically started to the cluster queue manager '    '. The queue
manager failed to start the cluster sender channel for reason MQRC_Q_FULL. This
may be due to a problem with the SYSTEM.CHANNEL.INITQ or a change to the
cluster queue manager that the channel represents.
ACTION:
This may be a transitory problem. Investigate the problem and if necessary
start the channel manually.

I also see AMQ9213 errors in the log, interspersed among the AMQ7208 warnings.

----- amqkfnca.c : 5338 -------------------------------------------------------
07/10/25 18:25:46 - Process(322.5) User(mqm) Program(amqrmppa)
                    Host(mqprod-prospect-core-server-msg-0) Installation(Installation1)
                    VRMF(9.4.2.0) QMgr(QC00)
                    Time(2025-07-10T18:25:46.789Z)
                    RemoteHost(127.0.0.6)
                    CommentInsert1(127.0.0.6)
                    CommentInsert2(TCP/IP)

AMQ9213E: A communications error for TCP/IP occurred.

EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP call was 0 (X'0'). Record these values and tell
the systems administrator.

I believe the suggestions of a network error as the root cause is correct.  I only see the messages on one of the two qmgrs.  Seems something prevented QC00 from connecting to QC01, but not vice versa.  Redeploying the pods resolved the issue.  I could possibly have restarted the channel manually as the 7208 message suggests.  

Jim

Erik Houlberg's profile image
Erik Houlberg

This may be a long shot, however I experienced the following situation: I was contacted as messages no longer reached a QM instance, that is one QM instance was fine and processed all messages whereas no messages reached the other QM instance.

The root cause turned out to be that the overall number of channel instances (set on qm.ini, MaxChannels, defaults to 100) had been reached on the QM that did not receive messages. As MQ clients used all the available channel instances then no QM to QM channel could not be started.

I would look at QC01 MQ error log to check if that was the case - and if so then maybe increase the max number of channel instances (on both QMs) and/or limit the number of channel instances used by MQ clients (MAXINST setting on the server connection channel). That was the actions I took and then restarted the QMs - and the messages started flowing as expected.

MaxChannels should be greater than the sum of MAXINST on all server connection channels plus the number of potential QM to QM channels instances to avoid that situation to happen again.
Note that a (cluster) receiver channel can be started by multiple senders (e.g. a cluster receiver channel by all other members of the MQ cluster).