Understanding JMSCS0044 and JMSCS0046 messages

6. RE: Understanding JMSCS0044 and JMSCS0046 messages

Like

Jim Creasman

IBM Champion

Posted Tue May 03, 2022 12:09 PM

Thanks, guys!

The information definitely helps to clarify the meaning and frequency. I'm still trying to work out the puzzle as to why we're seeing the messages and why they reoccur. I'm starting to develop a theory I'd like to run past you all. It will help first if you understand the configuration of our environment.

We have a uniform cluster consisting of two qmgrs, QC00 and QC01.
Clients connect via CCDTUrl. The GET to the URL returns four connection entires, always in the same order:
1. Host/port for QC00 as qmgr "QC00"
2. Host/port for QC00 as qmgr "QM_ANY"
3. Host/port for QC01 as qmgr "QC01"
4. Host/port for QC01 as qmgr "QM_ANY"
Clients specify "*QM_ANY" for the qmgr when connecting.
There are two instances of this JMS client.
Each JMS client instance creates two connections with these options: MQCNO_HANDLE_SHARE_BLOCK,MQCNO_SHARED_BINDING,MQCNO_RECONNECT

I've noticed that clients always connect to QC00 when they first come up in this configuration (no surprise since QC00 appears first in the list). Later, they get a rebalance request and move to QC01.

When I look on the MQ server I see two connections from this client on each qmgr, so that tells me these are balanced. When I look closer at the client logs I see a message for "parent hconn" and "child hconn". My guess is those are the two connections showing up on the server. Does that sound right?

I noticed something else when I looked closer at the reconnect message that specifies a qmgr (i.e., the one for rebalancing per Mark's response).

Queueing reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]' to queue manager 'QC01                                            '

Note that the qmgr name has a lot of blanks after the name. The GET to the CCDTUrl returns JSON that does not have the blanks. Are these blanks a concern, or just something that gets added when the message text is formatted?

I never see a connection request to QC01 actually work, so perhaps the rebalance is failing (though, as mentioned, I see the correct number of connections on each server). After each reconnect message to QC01, there follows a series of these messages where the qmgr name is 'null'. Are these the result of it working down the list returned by the CCDTUrl? Does the "1000,2000,4000,8000,1600,25000" delay sequence apply as it attempts to work down this list? The balance request seems to generate a mini-storm of reconnect activity when I would expect it to work immediately as our other clients do.

Still investigating...

------------------------------
Jim Creasman
------------------------------

Original Message

Original Message:
Sent: Tue May 03, 2022 07:54 AM
From: Mark Bluemel
Subject: Understanding JMSCS0044 and JMSCS0046 messages

As Matt has said, the "rescheduling" message indicates that an attempt to reconnect has failed and will be retried after a period of time. The reconnection times are used to schedule the subsequent attempts - in your example, the second attempt will be at least 1093 milliseconds after the first (there's only one thread doing the reconnections, so I'm not sure we can give exact timings!), the third 2205 milliseconds after the second and so forth. As Matt says, there's a random factor added in to try to avoid multiple clients scheduling their reconnections simultaneously.

A "starting reconnection" message with a non-null queue manager is given when the Uniform Cluster has requested a client move to another queue manager as part of the balancing activity.

The fact that the "starting reconnection" message in your example gives a queue manager name of "null" indicates that this is an attempt to reconnect due to a connection failure - if there was a previous message for the same hconn with a non-null queue manager, it would suggest that the attempt to move for rebalancing failed, so an attempt to reconnect anywhere was attempted; if no such message appeared, then there's been a connection broken event and the hconn is simply reconnecting to recover from that.

The repeated connection attempts suggest that there's been some, presumably transient, issue which prevented the selected queue manager (in this context, we'd simply select one from the CCDT according to usual principles) from accepting the connect attempt. Perhaps we should consider logging the name of the queue manager to which we attempted to connect, in the JMSCS0046 message, as it would help give an insight into what queue manager logs you may wish to review to understand this behaviour.

I'll look at adding more explanation to the messages, I think.

I could also look at allowing them to be suppressed, asI think they provide a valuable insight into Uniform Cluster (and more general reconnection) activity, I'm not sure I'd recommend that.

I hope this is helpful!

------------------------------
Mark Bluemel

Original Message:
Sent: Mon May 02, 2022 02:33 PM
From: Jim Creasman
Subject: Understanding JMSCS0044 and JMSCS0046 messages

Hey Matt,

In this configuration we have two instances of this JMS consumer client, each receiving and processing messages from a queue. I haven't been able to discern a pattern in the messages. They don't always show up, and when they do, sometimes it's only one of the two instances that is reporting the error. I'm still trying to figure out if there's a pattern.

As for your question about frequency, when the messages do start happening these appear every few seconds on average. After several sets they stop. I don't see any messages on the server side with corresponding timestamps.

I see this message at startup of the app. I'm sure this is a permissions error due to the user not having write access to the JVM working directory. I've asked the developer to fix this. Not sure it's related, but this error might be preventing some error data capture from happening that could help.

Could not initialize log file, java.nio.file.AccessDeniedException: mqjms.log.0.lck

I also see this message as soon as the application is started. Are these a pseudo-random sequence of wait times between reconnection attempts? If so, that could explain the variability I'm seeing between occurrences of the reconnect messages.

Reconnection times (milliseconds) : '[1093, 2205, 4268, 8108, 17075, 25368]'

Thanks

------------------------------
Jim Creasman

Original Message:
Sent: Mon May 02, 2022 12:48 PM
From: MATTHEW LEMING
Subject: Understanding JMSCS0044 and JMSCS0046 messages

Hi Jim,

The second message means a reconnect failed and that it will be rescheduled for some time in the future. I wouldn't expect to see any, or at least many, of the second messages if there is an available queue manager to connect to.

How many of these are you getting?

Regards, Matt.

------------------------------
MATTHEW LEMING

Original Message:
Sent: Mon May 02, 2022 12:03 PM
From: Jim Creasman
Subject: Understanding JMSCS0044 and JMSCS0046 messages

We have a JMS MQ client that is consuming messages from a queue. It seems to be working OK. However, these two messages are repeated in the log over and over:

--------------------------------------------------------------------April 29, 2022 5:29:52 PM UTC[JMSCCThreadPoolWorker-3] com.ibm.mq.jmqi.remote.impl.RemoteReconnectThreadStarting reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]' to queue manager 'null'EXPLANATION:nullACTION:null--------------------------------------------------------------------April 29, 2022 5:29:52 PM UTC[JMSCCThreadPoolWorker-3] com.ibm.mq.jmqi.remote.impl.RemoteReconnectThreadRescheduling reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]'EXPLANATION:nullACTION:null

I believe these correspond to JMSCS0044 and JMSCS0046 messages as documented here. Unfortunately, neither the docs nor message give an explanation or action.

Are these messages a concern? The client is connecting into a uniform cluster using a CCDT. I know it's normal for clients connected like this to get moved as part of rebalancing, but this seems excessive. If the messages are just informational and represent expected behavior, then I'd like to suppress them so they don't fill up the log.

Thanks,
Jim

------------------------------
Jim Creasman
------------------------------

7. RE: Understanding JMSCS0044 and JMSCS0046 messages

Like

Mark Bluemel

Posted Wed May 04, 2022 04:11 AM

Hi Jim,

Some points for clarification, then I'll try and discuss what you're seeing.

The "parent hconn" and "child hconn" concept was needed for handling reconnection with JMS. In JMS you have Connection objects, which each use an hconn, and related to each Connection, one or more Session objects, which also each use an hconn. (If you're using JMS 2.0, a Context actually wraps a Connection and a Session - though if you create a Context from another Context, they share the Connection used).

Naturally a Connection and all its Sessions must be connected to the same queue manager, so the hconns cannot be balanced independently. So we group them together - the hconn relating to a Connection is the "parent hconn" and is aware of all the hconns relating to the corresponding Sessions, termed "child hconns". In your case, as you see each instance of your application create 2 connections on the corresponding queue manager, I can deduce that your application only uses one Connection and one Session (or in JMS 2.0, one Context wrapping one of each).

Any event which instigates reconnection - a balancing request from the uniform cluster, or an actual connection break - is handled by the parent hconn, which reconnects itself, then reconnects all its children. So you'll only ever see a parent hconn queued to reconnect, and subsequently when a reconnection of the parent succeeds, you'll see the child hconns also reconnected.

The trailing spaces in the queue manager name logged are unimportant - there are some "interesting" interactions between Java Strings, which are counted strings and hence variable-length, and the fixed-length names used in MQ.

Your description of a parent hconn being requested to reconnect to "QC01", then being rescheduled and attempting to connect to any queue manager (the behaviour when the logged queue manager is null), does indeed suggest that your reconnections are failing for some reason, and as you say the retries working through the CCDT till they find a queue manager to which they can connect, with the retry times coming into play.

There isn't the level of detail in the log that would say why the reconnection failed - perhaps there should be, but perhaps it would be regarded as additional "noise". When implementing the uniform cluster support in JMS we wanted to provide some indication of what the client was doing but not too much, so indicating start, successful end, rescheduling and aborting of reconnections was decided upon.

You may find some information of value in the QC01 error logs - perhaps some indication of it rejecting connections for some reason.

Beyond that, to get much more information about what's happening, I'd need to look at client trace from the application(s) - but GDPR doesn't allow me to look at your data, I'm afraid.

If you need more help, you could perhaps raise a service ticket - if you do so, please mention that I've been in dialog with you and would be happy to help the service team.

------------------------------
Mark Bluemel
------------------------------

Original Message

Original Message:
Sent: Tue May 03, 2022 12:08 PM
From: Jim Creasman
Subject: Understanding JMSCS0044 and JMSCS0046 messages

Thanks, guys!

The information definitely helps to clarify the meaning and frequency. I'm still trying to work out the puzzle as to why we're seeing the messages and why they reoccur. I'm starting to develop a theory I'd like to run past you all. It will help first if you understand the configuration of our environment.

We have a uniform cluster consisting of two qmgrs, QC00 and QC01.
Clients connect via CCDTUrl. The GET to the URL returns four connection entires, always in the same order:
1. Host/port for QC00 as qmgr "QC00"
2. Host/port for QC00 as qmgr "QM_ANY"
3. Host/port for QC01 as qmgr "QC01"
4. Host/port for QC01 as qmgr "QM_ANY"
Clients specify "*QM_ANY" for the qmgr when connecting.
There are two instances of this JMS client.
Each JMS client instance creates two connections with these options: MQCNO_HANDLE_SHARE_BLOCK,MQCNO_SHARED_BINDING,MQCNO_RECONNECT

I've noticed that clients always connect to QC00 when they first come up in this configuration (no surprise since QC00 appears first in the list). Later, they get a rebalance request and move to QC01.

When I look on the MQ server I see two connections from this client on each qmgr, so that tells me these are balanced. When I look closer at the client logs I see a message for "parent hconn" and "child hconn". My guess is those are the two connections showing up on the server. Does that sound right?

I noticed something else when I looked closer at the reconnect message that specifies a qmgr (i.e., the one for rebalancing per Mark's response).

Queueing reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]' to queue manager 'QC01                                            '

Note that the qmgr name has a lot of blanks after the name. The GET to the CCDTUrl returns JSON that does not have the blanks. Are these blanks a concern, or just something that gets added when the message text is formatted?

I never see a connection request to QC01 actually work, so perhaps the rebalance is failing (though, as mentioned, I see the correct number of connections on each server). After each reconnect message to QC01, there follows a series of these messages where the qmgr name is 'null'. Are these the result of it working down the list returned by the CCDTUrl? Does the "1000,2000,4000,8000,1600,25000" delay sequence apply as it attempts to work down this list? The balance request seems to generate a mini-storm of reconnect activity when I would expect it to work immediately as our other clients do.

Still investigating...

------------------------------
Jim Creasman

Original Message:
Sent: Tue May 03, 2022 07:54 AM
From: Mark Bluemel
Subject: Understanding JMSCS0044 and JMSCS0046 messages

As Matt has said, the "rescheduling" message indicates that an attempt to reconnect has failed and will be retried after a period of time. The reconnection times are used to schedule the subsequent attempts - in your example, the second attempt will be at least 1093 milliseconds after the first (there's only one thread doing the reconnections, so I'm not sure we can give exact timings!), the third 2205 milliseconds after the second and so forth. As Matt says, there's a random factor added in to try to avoid multiple clients scheduling their reconnections simultaneously.

A "starting reconnection" message with a non-null queue manager is given when the Uniform Cluster has requested a client move to another queue manager as part of the balancing activity.

The fact that the "starting reconnection" message in your example gives a queue manager name of "null" indicates that this is an attempt to reconnect due to a connection failure - if there was a previous message for the same hconn with a non-null queue manager, it would suggest that the attempt to move for rebalancing failed, so an attempt to reconnect anywhere was attempted; if no such message appeared, then there's been a connection broken event and the hconn is simply reconnecting to recover from that.

The repeated connection attempts suggest that there's been some, presumably transient, issue which prevented the selected queue manager (in this context, we'd simply select one from the CCDT according to usual principles) from accepting the connect attempt. Perhaps we should consider logging the name of the queue manager to which we attempted to connect, in the JMSCS0046 message, as it would help give an insight into what queue manager logs you may wish to review to understand this behaviour.

I'll look at adding more explanation to the messages, I think.

I could also look at allowing them to be suppressed, asI think they provide a valuable insight into Uniform Cluster (and more general reconnection) activity, I'm not sure I'd recommend that.

I hope this is helpful!

------------------------------
Mark Bluemel

Original Message:
Sent: Mon May 02, 2022 02:33 PM
From: Jim Creasman
Subject: Understanding JMSCS0044 and JMSCS0046 messages

Hey Matt,

In this configuration we have two instances of this JMS consumer client, each receiving and processing messages from a queue. I haven't been able to discern a pattern in the messages. They don't always show up, and when they do, sometimes it's only one of the two instances that is reporting the error. I'm still trying to figure out if there's a pattern.

As for your question about frequency, when the messages do start happening these appear every few seconds on average. After several sets they stop. I don't see any messages on the server side with corresponding timestamps.

I see this message at startup of the app. I'm sure this is a permissions error due to the user not having write access to the JVM working directory. I've asked the developer to fix this. Not sure it's related, but this error might be preventing some error data capture from happening that could help.

Could not initialize log file, java.nio.file.AccessDeniedException: mqjms.log.0.lck

I also see this message as soon as the application is started. Are these a pseudo-random sequence of wait times between reconnection attempts? If so, that could explain the variability I'm seeing between occurrences of the reconnect messages.

Reconnection times (milliseconds) : '[1093, 2205, 4268, 8108, 17075, 25368]'

Thanks

------------------------------
Jim Creasman

Original Message:
Sent: Mon May 02, 2022 12:48 PM
From: MATTHEW LEMING
Subject: Understanding JMSCS0044 and JMSCS0046 messages

Hi Jim,

The second message means a reconnect failed and that it will be rescheduled for some time in the future. I wouldn't expect to see any, or at least many, of the second messages if there is an available queue manager to connect to.

How many of these are you getting?

Regards, Matt.

------------------------------
MATTHEW LEMING

Original Message:
Sent: Mon May 02, 2022 12:03 PM
From: Jim Creasman
Subject: Understanding JMSCS0044 and JMSCS0046 messages

We have a JMS MQ client that is consuming messages from a queue. It seems to be working OK. However, these two messages are repeated in the log over and over:

--------------------------------------------------------------------April 29, 2022 5:29:52 PM UTC[JMSCCThreadPoolWorker-3] com.ibm.mq.jmqi.remote.impl.RemoteReconnectThreadStarting reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]' to queue manager 'null'EXPLANATION:nullACTION:null--------------------------------------------------------------------April 29, 2022 5:29:52 PM UTC[JMSCCThreadPoolWorker-3] com.ibm.mq.jmqi.remote.impl.RemoteReconnectThreadRescheduling reconnection of parent hconn 'com.ibm.mq.jmqi.remote.impl.RemoteSession[:/66db9cd2][connectionId=414D514351433030202020202020202008B2666202FDA920,traceIdentifier=1,remoteTraceIdentifier=-1]'EXPLANATION:nullACTION:null

I believe these correspond to JMSCS0044 and JMSCS0046 messages as documented here. Unfortunately, neither the docs nor message give an explanation or action.

Are these messages a concern? The client is connecting into a uniform cluster using a CCDT. I know it's normal for clients connected like this to get moved as part of rebalancing, but this seems excessive. If the messages are just informational and represent expected behavior, then I'd like to suppress them so they don't fill up the log.

Thanks,
Jim

------------------------------
Jim Creasman
------------------------------

8. RE: Understanding JMSCS0044 and JMSCS0046 messages

Like

Jim Creasman

IBM Champion

Posted Fri May 06, 2022 10:19 AM

Thanks for the detailed info. I always learn so much when I post a question here!

I'm still not sure why the reconnections would be failing one time and working the next. Whenever I've checked the clients seem to be balanced across the cluster as I expect. To be fair, though, I haven't had time to dig into this myself and try to recreate. I'm going off of the logs from client code I didn't write. Now that I better understand how this works, I'll continue to follow and post any additional questions (or resolution) here.

Jim

------------------------------
Jim Creasman
------------------------------

Original Message

Original Message:
Sent: Wed May 04, 2022 04:10 AM
From: Mark Bluemel
Subject: Understanding JMSCS0044 and JMSCS0046 messages