Regards .. Mayur
Original Message:
Sent: Mon March 18, 2024 04:11 AM
From: Morag Hughson
Subject: Possible use of BATCHHB for in-doubt channels
I am so happy to hear it went well.
Remember that INDOUBT is not a bad thing. It happens on EVERY SINGLE batch. Normally though it is a fleeting state because sender and receiver are connected and the flow to ask "are you well" and the response to say "yes all is good" happen in a very small timeframe. It is thus normally very hard to see INDOUBT state, but it is there at the end of every batch.
If you happen to lose connectivity and are in RETRY in between those two flows, then you might see INDOUBT state, but again, this is normal and will be happily resolved by the channels once connectivity is restored.
INDOUBT only becomes a problem when connectivity cannot be restored, say because your IP address is pointing to the wrong place, perhaps because your shared listeners and channel connames are not correctly configured.
All the best,
Cheers,
Morag
------------------------------
Morag Hughson
MQ Technical Education Specialist
MQGem Software Limited
Website: https://www.mqgem.com
Original Message:
Sent: Mon March 18, 2024 03:07 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
Hi @Morag Hughson and @Mayur RAJA ,
last informations from us regarding this subject.
Last weekend we had a full "ripple" IPL for the first time after implementing all suggestions by you both.
This means datacenter 1 ipl's the odd machines LPA1/3/5/7 and afterwards datacenter 2 the even ones, LPA2/4/6/8 .
All worked very well, only once a short INDOUBT happened for one sender channel until the first retry succeeded.
So in total it was a full success !
As a side effect we now use different ports for qmgr-to-qmgr connections (and our admin tools) and clients, really useful.
Thanks again and best regards !
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
Original Message:
Sent: Tue January 23, 2024 03:01 AM
From: Mayur RAJA
Subject: Possible use of BATCHHB for in-doubt channels
Hi Norbert,
Your are welcome. Please let us know if we can be of further help.
Regards ..
------------------------------
Mayur RAJA
Original Message:
Sent: Tue January 23, 2024 01:29 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
Hi Majur,
thank you for these very detailed answers !
That will lead us to xmitq definitions that are suitable for our new topology.
Again, thank you so much !
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
Original Message:
Sent: Fri January 19, 2024 11:06 AM
From: Mayur RAJA
Subject: Possible use of BATCHHB for in-doubt channels
Hi Norbert,
As there are a lot of attributes to do with sharing, it is first worth considering the purpose of each attribute:
- SHARE or NOSHARE indicates whether a queue can be opened for input (MQGET) by multiple applications or not. SHARE means it can and NOSHARE means it cannot.
Note: A queue can always be opened for output (MQPUT/MQPUT1) by multiple applications (assuming they are authorized to open the queue for output).
In general, transmission queues are only accessed by one application (i.e. the Message Channel Agent (MCA)) at a time. This is true for MCAs for both PRIVATE and SHARED channels. So, it is best to define shared transmission queues with NOSHARE.
Note: The SYSTEM.QSG.TRANSMIT.QUEUE used by Intra Group Queuing (IGQ) is different. It is defined with SHARE because IGQ Agents that run on each Queue Sharing Group (QSG) Queue Manager need to open the queue for input (MQGET). But as you are not looking at IGQ you don't need to be concerned about this.
- The DEFSOPT attribute defines the default SHARE option. It can be set to SHARED or EXCL (exclusive). When an application opens a queue for input, it can specify one of the following open options:
a. MQOO_INPUT_AS_Q_DEF – indicates open the queue based on the value of DEFSOPT.
b. MQOO_INPUT_EXCLUSIVE – indicates open the queue for exclusive use if it is not already open (for shared or exclusive use) by this or another application.
c. MQOO_INPUT_SHARED – indicates open the queue for shared use if it is not already open for exclusive use, though it may already be open for shared use.
But, it is important to note that SHARE and NOSHARE also have a bearing on this. That is, if a queue is defined with NOSHARE, an attempt to open the queue for shared use defaults to opening the queue for exclusive use (assuming it is not already open for exclusive use). Similarly, if a queue is defined with SHARE, an attempt to open the queue for exclusive use will work (and the queue will be opened for exclusive use) if the queue is not already open for shared or exclusive use by this or another application.
A sender channel is shared (and not private) if it serves a shared transmission queue (i.e. a queue defined in a CFSTRUCTURE in the Coupling Facility (CF)).
While a shared sender channel is running, it is managed by a single Channel Initiator in the QSG. That means there is only one instance of the shared sender channel running. Since the SHARED channel definition is defined as a GROUP object, it resides in Db2 so that all Queue Managers in the QSG can get to it. This means that the shared sender channel is eligible to run on any Queue Manager in the QSG. This is especially useful in the event of a failure of a QSG Queue Manager for ensuring high availability of the channel since another Queue Manager in the QSG can takeover running the channel.
Given that there will only be once instance of a shared sender channel accessing the shared transmission queue at any point in time, it is still OK to set DEFSOPT(EXCL).
- The QSGDISP (e.g. SHARED or QMGR) attribute on the transmission queue defines whether the transmission queue is SHARED or PRIVATE. For a transmission queue defined with QSGDISP(SHARED), messages on the queue are stored in the Coupling Facility (CF) and hence are available to be retrieved by a shared sender channel that runs on any Queue Manager in the QSG.
- Regarding Note 1. that your colleague found, messages in a CF structure are limited to 64K but, as MQ reserves 1K of that for headers, etc., application message data is limited to 63K. However, this is clearly too small for a lot of application messages so, larger messages put to a CF Structure can be stored on Shared Message Data Sets (SMDS), which are essentially VSAM linear datasets (you do of course need to configure the SMDS environment), or in Db2 blobs. So, this limitation can be overcome. See 'Shared message dataset performance and capacity considerations' : https://www.ibm.com/docs/en/ibm-mq/9.3?topic=pycfose-planning-your-shared-message-data-set-smds-environment#q006020___SharedMessageDataSetCharacteristic
Note: Storing in Db2 could allow for much more data in total but storing in SMDS is more performant (about 10 times faster) than storing in Db2. So, we generally recommend that customers use SMDS.
The CFLevel and OFFLOAD rules defined for a CFSTRUCTURE are also factors to consider, see: https://www.ibm.com/docs/en/ibm-mq/9.3?topic=formats-change-copy-create-cf-structure-zos .
- Regarding Note 2. that your colleague found, I discussed this with one of my colleagues and the indexing is intrinsic to how messages are stored on shared queues. It is a requirement that you set INDXTYPE(MSGID) for shared transmission queues however, setting it does not use more storage or impact the performance of non-selective gets. The only implication is that you cannot perform a get by CORRELID without also matching the MSGID. But, for messages retrieved from transmission queues, this is not an issue.
Regards
------------------------------
Mayur RAJA
Original Message:
Sent: Fri January 19, 2024 04:50 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
At the moment we are "in full swing" to change all our topology.
We have different tasks to do and try to roll them out from stage to stage:
Lab -> Dev -> QA -> Preproduction -> Production
- Established new listeners to isolate qmgr-to-qmgr (and qsg) from client connections
- Therefore changed the port for sender/cluster channels
- created cf structures for the new xmit queues
Now we wanted to create the new xmit queues, e.g.:
DEFINE QLOCAL('QSG2') +
QSGDISP(SHARED) +
DESCR(' ') +
CFSTRUCT('XMIT') +
CLUSTER(' ') +
CLUSNL(' ') +
CLWLRANK(0) +
CLWLPRTY(0) +
CLWLUSEQ(QMGR) +
USAGE(XMITQ) +
CLCHNAME(' ') +
STREAMQ(' ') +
STRMQOS(BESTEF) +
INDXTYPE(NONE) +
STGCLASS('XMIT') +
MAXDEPTH(999999999) +
MAXMSGL(4194304) +
DEFPRTY(0) +
DEFPSIST(NO) +
DEFPRESP(SYNC) +
DEFREADA(NO) +
DEFBIND(OPEN) +
MSGDLVSQ(PRIORITY) +
PUT(ENABLED) +
GET(ENABLED) +
NOHARDENBO +
BOTHRESH(0) +
BOQNAME(' ') +
SHARE +
DEFSOPT(SHARED) +
RETINTVL(999999999) +
PROPCTL(COMPAT) +
CUSTOM(' ') +
TRIGGER +
INITQ('SYSTEM.CHANNEL.INITQ') +
PROCESS(' ') +
TRIGTYPE(FIRST) +
TRIGMPRI(0) +
TRIGDPTH(1) +
TRIGDATA('QSG1.QSG2') +
QDEPTHHI(80) +
QDEPTHLO(40) +
QDPMAXEV(ENABLED) +
QDPHIEV(DISABLED) +
QDPLOEV(DISABLED) +
QSVCINT(999999999) +
QSVCIEV(NONE) +
STATQ(QMGR) +
ACCTQ(QMGR) +
MONQ(QMGR) +
REPLACE
That brings us to the following considerations:
- The attributes SHARE and DEFSOPT(SHARED) should be a logical conclusion since they are now shred between the members of the QSG (instead of NOSHARE and DEFSOPT(EXCLUSIVE) when a private xmit queue )
- We normally have INDXTYPE(NONE) for xmit queues but my colleague stumbled over the following entry in Documentation Center
Migrating non-shared queues to shared queues
Note:- Messages on shared queues are subject to certain restrictions on the maximum message size, message persistence, and queue index type, so you might not be able to move some non-shared queues to a shared queue.
- You must use the correct index type for shared queues. If you migrate a transmission queue to be a shared queue, the index type must be MSGID.
Does changing the INDXTYPE have any further consequences ?
Best regards,
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
Original Message:
Sent: Thu April 27, 2023 04:45 AM
From: Morag Hughson
Subject: Possible use of BATCHHB for in-doubt channels
Hi Norbert,
Yes, sounds like you have understood fully and your changes will make your channels run much more smoothly. I am glad you also found the appropriate sections in IBM Docs to explain it once you knew what you are looking for too.
Sounds like the extra listener was the perfect solution for you, not too disruptive and yes, separating clients and QMgr-QMgr channels in a QSG environment is definitely a good thing to do.
If you're not already full to the brim with information on this subject, this blog post might also be a good read:-
SVRCONNs and INDISP(SHARED) listeners
Glad I was able to help.
Cheers,
Morag
------------------------------
Morag Hughson
MQ Technical Education Specialist
MQGem Software Limited
Website: https://www.mqgem.com
Original Message:
Sent: Thu April 27, 2023 02:20 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
Damn, i think i got it now !
All of our QSG's have definitions like START LISTENER TRPTYPE(TCP) PORT(1414) IPADDR(QSG2.Nuremberg.DE) INDISP(QMGR)
There is this page Shared channels in the docs (very useful and clarifying !).
And i found some notes in our team recordings(2012) regarding how to manage the listeners:
Make listeners INDISP(QMGR) for client connections, INDISP(GROUP) is not useful for them !
So we switched to this topology.
Instead jut changing the INDISP we should have added a new listener for QSG inter-communication :
START LISTENER TRPTYPE(TCP) PORT(nnnn) IPADDR(QSG2.Nuremberg.DE) INDISP(GROUP)
and adjust all CONNAME attributes regarding QSG2.Nuremberg.DE
for channels type SDR/SVR/CLUSSDR to the new established port.
This is described in section "Configuring SVRCONN channels for a queue sharing group" of Shared channels:
The optimal configuration for SVRCONN channels in a queue sharing group is to set up private listeners in each CHINIT which use a different port number from the point to point channels.
Fortunately this looks like a smooth transition since those point to point channel connections between qmgrs are in our responsibility and don't bother clients.
Thank you very much, @Morag Hughson , as always your tips and hints are very useful.
Hopefully i did understand it :-)
Regards, Norbert
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
Original Message:
Sent: Wed April 26, 2023 10:01 AM
From: Morag Hughson
Subject: Possible use of BATCHHB for in-doubt channels
I agree that you likely have an architectural problem. You appear to be using an IP Address in your SDR channel CONNAME that is a DVIPA / Sysplex Distributor or some other type of address that represents all the members of QSG2 and thus each time the SDR reconnects it is routed to one of the members. However the targeted port number would appear to be the QSG2 INDISP(QMGR) port number rather than the INDISP(GROUP) port number as evident by the contents of your SYNCQ. The RCVR channel is providing its partner name as the QMgr name and not the QSG name.
This means the SDR has an in doubt batch with a partner QMgr and if it ends indoubt, retries, and connects to a different member of QSG2 it cannot continue because it is indoubt with someone else. This is what your CSQX507E messages are all about.
If you are going to use such an IP Address it must target an INDISP(GROUP) listener port.
Cheers,
Morag
------------------------------
Morag Hughson
MQ Technical Education Specialist
MQGem Software Limited
Website: https://www.mqgem.com
Original Message:
Sent: Wed April 26, 2023 08:08 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
Hi @Morag Hughson ,
regarding the in-doubt situation:
Different channels isn't the case here, there is only this one channel QSG1.QSG2
using xmitq QSG2 (prooved by MO71 :-) ).
SYSTEM.CHANNEL.SYNCQ
is QSGDISP(QMGR), so also private.
But i admit that we possibly have an architectural problem because we had this incident occasionally:
When one of LPA1 to LPA8 was re-ipl'ed for maintenance we had some minutes until the receiving qmgr was available again.
This happened at weekends and only for some minutes so in-doubt channels vanished afterwards (for sure, my colleague and me observed this once).
But this time, last Friday, LPA1 was down for 22 hours.
Some more infos about our configuration:
QSG1 has members MQ11 to MQ81 over all 8 LPAR's.
QSG2 has members MQ12 to MQ82 over all 8 LPAR's.
Both channels and xmit queues are private (i suppressed superfluous attributes), here MQ61 for example:
DEFINE CHANNEL('QSG1.QSG2') +
CHLTYPE(SDR) +
QSGDISP(GROUP) +
CONNAME('QSG2.Nuremberg.de') +
XMITQ('QSG2') +
MAXMSGL(4194304) +
HBINT(300) +
KAINT(AUTO) +
DISCINT(6000) +
SEQWRAP(999999999) +
REPLACE
DEFINE QLOCAL('QSG2') +
QSGDISP(GROUP) +
USAGE(XMITQ) +
INDXTYPE(NONE) +
STGCLASS('XMIT') +
MAXDEPTH(999999999) +
MAXMSGL(4194304) +
DEFPRTY(0) +
DEFPSIST(NO) +
DEFPRESP(SYNC) +
DEFREADA(NO) +
DEFBIND(OPEN) +
MSGDLVSQ(PRIORITY) +
PUT(ENABLED) +
GET(ENABLED) +
NOHARDENBO +
BOTHRESH(0) +
NOSHARE +
DEFSOPT(EXCL) +
RETINTVL(999999999) +
PROPCTL(COMPAT) +
TRIGGER +
INITQ('SYSTEM.CHANNEL.INITQ') +
TRIGTYPE(FIRST) +
TRIGMPRI(0) +
TRIGDPTH(1) +
TRIGDATA('QSG1.QSG2') +
REPLACE
Shortly before z/OS maintenance LPA1 was shutdown with the receiving partner MQ12 for sender MQ61 at that time.
Here is the joblog of MQ61CHIN:
16:08:35.900 CSQX599E M61P CSQXRCTL Channel P08P.P01P ended abnormally
16:08:35.900 CSQX206E M61P CSQXRCTL Error sending data,
channel QSG1.QSG2
connection qsg2 (1.2.3.4)
(queue manager MQ12)
TRPTYPE=TCP RC=0000008C reason=76697242
16:08:46.410 CSQX599E M61P CSQXRCTL Channel QSG1.QSG2 ended abnormally
16:08:46.410 LPA6 M61PCHIN CSQX507E CSQX507E MQ61 CSQXRCTL Channel QSG1.QSG2 is in-doubt,
connection MQ12
(queue manager MQ22)
16:09:47.310 CSQX599E MQ61 CSQXRCTL Channel QSG1.QSG2 ended abnormally
16:09:47.310 LPA6 MQ61CHIN CSQX507E CSQX507E MQ61 CSQXRCTL Channel QSG1.QSG2 is in-doubt,
connection MQ12
(queue manager MQ72)
16:10:48.210 LPA6 MQ61CHIN CSQX507E CSQX507E MQ61 CSQXRCTL Channel QSG1.QSG2 is in-doubt,
connection MQ12
(queue manager MQ82)
16:10:48.220 CSQX599E MQ61 CSQXRCTL Channel QSG1.QSG2 ended abnormally
16:11:49.190 CSQX599E MQ61 CSQXRCTL Channel QSG1.QSG2 ended abnormally
16:11:49.190 LPA6 MQ61CHIN CSQX507E CSQX507E MQ61 CSQXRCTL Channel QSG1.QSG2 is in-doubt,
connection MQ12
(queue manager MQ42)
16:12:52.200 CSQX599E MQ61 CSQXRCTL Channel QSG1.QSG2 ended abnormally
16:12:52.200 LPA6 MQ61CHIN CSQX507E CSQX507E MQ61 CSQXRCTL Channel QSG1.QSG2 is in-doubt,
connection MQ12
(queue manager MQ72)
I had a look into the messages of SYSTEM.CHANNEL.SYNCQ
. There are as many entries of QSG1.QSG2
as MQ61 has ever had connection to a member of QSG2.
That is understandable as MQ61 has to save the channel status informations like MSGSEQNO etc.
MQ61 tries all other qsg members round-robin (as presumed) but always mentions the original qmgr MQ12 in CSQX206E .
That is irritating me...
Best regards, Norbert
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
Original Message:
Sent: Tue April 25, 2023 05:57 PM
From: Morag Hughson
Subject: Possible use of BATCHHB for in-doubt channels
Hi Norbert,
The CSQX507E message, an example shown in full below as a reminder to other readers, is indicating that when a channel tried to start it discovered that another batch of messages for a different channel from this same transmission queue was already in-doubt.
+CSQX507E cpf CSQXRCTL Channel CSQ1.TO.CSQ2.T01 is in-doubt, connection CSQ2 (queue manager ????)+CSQX599E cpf CSQXRCTL Channel CSQ1.TO.CSQ2.T02 ended abnormally
I would ask why you have more than one channel using the same transmission queue. This is an unusual situation to be in.
You also ask about Batch Heartbeat Interval. This is a helpful additional flow added to the channel protocol where the sender channel will check that the partner is still there before marking the batch in-doubt and proceeding with the end of batch processing. This is helpful in situations where your have an unstable network. It is mainly helpful in clustering situations where, if the messages had not become part of an in-doubt batch, they would have been reassigned to another channel to send somewhere else. For non-cluster channels, the messages will be moved by the same channel once the network comes back so there is little benefit to be gained.
For more reading on this, try page 17 of Keeping MQ Channels Up and Running
I would investigate why you have two different channels using the same transmission queue. You mention that this is in a QSG. Are the transmission queues shared or private? Is the running disposition of the channels shared or private (i.e. is the SyncQ in use a shared or private queue).
Cheers,
Morag
------------------------------
Morag Hughson
MQ Technical Education Specialist
MQGem Software Limited
Website: https://www.mqgem.com
Original Message:
Sent: Mon April 24, 2023 04:08 AM
From: Norbert Pfister
Subject: Possible use of BATCHHB for in-doubt channels
Hi folks,
we recently had some CSQX507E events (again) in our Queue Sharing Groups on z/OS.
Our topology:
- production LPAR's LPA1-LPA8
- MQ level V9.2
- two QSG's , name it QSG1 and QSG2 with both 8 members spread over the LPAR's
- Sender/Receiver group channels QSG1.QSG2 and group xmit queues QSG2
During a z/OS change the CSQX507E happened.
Some of our batch schedules had problems with this situation since their messages got stuck in the xmit queues.
Finally we had to manually resolve the situation .
In this KC topic the BATCHHB (Batch Heartbeat Interval) is mentioned.
We consider the use of it but have no experience with this attribute,
so which applicable value to use is unknown for us .
Regards, Norbert
------------------------------
Norbert Pfister
system engineer
Nuremberg
Germany
------------------------------