MQ

 View Only
Expand all | Collapse all

New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

  • 1.  New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Mon September 25, 2023 09:59 AM

    Hello MQ group users,

    Any advise on the following will be much appreciated.

    We have an existing MSCS MQ cluster (2 node setup Active / Passive: w000011880-w000012880)  where two existing QMGRs  (one active running, one test ended)  are already deployed on MQ cluster instance named "A000011880 - IBM MQ"  utilizing a common shared drive ("D: Disk_D") with low space usage.

    A new QMGR has been lately (a few days ago) has been created for use with MSCS and added to the above MQ Cluster instance / moving its data & logs to the MSCS shared drive ("D") via hamvmqm command. The new QMGR (NBGCYT24PRD) has been started correctly and listed in the DSPMQ output command:

     According to customer, all steps listed at https://www.ibm.com/docs/en/ibm-mq/9.2?topic=configurations-supporting-microsoft-cluster-service-mscs  such as creating a QM for use with MSCS, Moving QM to MSCS storage, etc:

    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-creating-queue-manager-use

    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-moving-queue-manager-storage 

    but  after the patching sheduled for the first cluster node (w000011880) during last weekend and the failover performed from the first (active) node: w000011880 to the second (passive) node: W000012880, the new QMGR is not listed in the DSPMQ command output and also cannot be started as it doesn't exist...

    FYI, all QMGR configuration still exists on cluster resource - shared drive "D" in the following path: D:\IBM\MQ\data\qmgrs\NBGCYT24PRD

    and recovery logs at D:\IBM\MQ\log\NBGCYT24PRD

    Could anybody advise what might be missing? Precise steps (if applicable) will be much appreciated.

    Thanks in advance,

    Cheers, Nick.



    ------------------------------
    Nick Dakoronias
    ------------------------------


  • 2.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 01:58 AM

    Hi Nick,

    I haven't done MQ in MSCS for quite a while, but my recollection is that after moving the files to the shared disk, you have to run a command to actually make the queue manager clustered.

    Additional to moving the queue manager to cluster storage, and not mentioned in your list of pages is:

    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-putting-queue-manager-under-control

    This documents the steps required in MSCS to make the queue manager managed by the cluster.

    Although the queue manager was running on the original server, perhaps it was not actually running under MSCS control.

    I would check the cluster configuration and make sure that the cluster does know about the queue manager. Follow the instructions in the manual if it is not.

    If it is already known to the cluster then it should also be known to the cluster member, so it sounds like something must have gone wrong in that case, which might call for a case to be raised with IBM.

    Regards



    ------------------------------
    Neil Casey
    Senior Consultant
    Syntegrity Solutions
    Melbourne, Victoria
    IBM Champion (Cloud) 2019-22
    ------------------------------



  • 3.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 08:25 AM

    Hi Neil,

    At first many thanks for your time and response.

    You are correct about your assumption that MSCS cluster config should be cross-checked in order to verify that the new QMGR is running under MSCS and also that it is known to the cluster so that it can be started and failover if needed.

    According to cust, feedback there were problems occurred during the step of putting the QMGR under MSCS control, according to article posted at    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-putting-queue-manager-under-control.

    Actually, based on the input received, they were unable to configure properly the properties of MQSeries MSCS resources and specifically name & dependencies, causing MQSeries MSCS resource to get offline and impact production msg flows, considering that an older existing QMGR currently supports several production business services.

    Considering the above,  I assume there are two options: 

    1. A case to IBM in order to investigate the cause of the symptom, along with resolution (without impacting the other production flows)
    2.  Assign a new  DEDICATED shared drive NOT used by any other QMGR, and assign it as Cluster resource  and try all the steps from the beginning, not being afraid about impacting other production flows.

    Thanks & Rgds, 

    Nick



    ------------------------------
    Nick Dakoronias
    ------------------------------



  • 4.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 11:36 AM

    Hi Nick,

    Adding a queue manager to the role should not have impacted the existing cluster, unless the qmgr was not defined properly to the cluster and the hamvmqm was not pointing to the correct storage!!! I expect the failure of the first to result from the failure of the second...



    ------------------------------
    Francois Brandelik
    ------------------------------



  • 5.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Wed September 27, 2023 03:33 AM

    Hi Francois,

    None of the failures you are assuming actually happened.

     The new QMGR has been created, defined and started properly as you can see in the attached screenshots:

    Furthermore the hamvmqm command also executed correctly pointing to the right cluster shared volume (D) for QM's data & logs, as per attached snapshot:

    The problem has been pinpointed correctly by Neil Casey and although the MSCS cluster  should not know about the new QMGR, it seems that actually it doesn't  as well as the second cluster member (after failover).  

    Rgds, Nick.



    ------------------------------
    Nick Dakoronias
    ------------------------------



  • 6.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Wed October 04, 2023 02:49 AM

    Dear all,

    This is to keep you posted that problem was fixed after assigning a new separate (dedicated) SAN mounted LUN (cluster disk) where the new QMGR (-s) have been deployed and joined under MSCS control smoothly. Failover tests were successful.

    Thanks for your time & tips.

    Rgds, Nick.   



    ------------------------------
    Nick Dakoronias
    ------------------------------



  • 7.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 04:06 AM

    Ni Nick,

    Are you sure you don't have any confusion between MSCS queue manager and multi-instance qmgr?

    In the MSCS set up the shared disk is used for the cluster heartbeat.

    Cluster storeage is used for the qmgr data and logs. Usually the cluster storeage is only on the active member of the cluster.
    The start up of the queue managers is set to manual and controlled by the cluster resource.

    Hope that helps clarify some of the setup.



    ------------------------------
    Francois Brandelik
    ------------------------------



  • 8.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 07:50 AM

    Hi Francois,

    At first thanks for your time.

    There is no confusion at all. It is not about having instances of the same queue manager in different nodes or systems. It is just a new QMGR  recently created and added under MSCS control moving its data & logs to MSCS Shared disk.

    Yes it is a shared disk since this is the terminology used by IBM at hamvmqm and other statements in article posted at https://www.ibm.com/docs/en/ibm-mq/9.2?topic=mscs-moving-queue-manager-storage. It is actually reflects the MQ cluster resource with its own IP (MQ Service IP) where client apps target to. 

    Furthermore, in our case during failover (from active to passive) the QMGR keeps the same IP address (MQ Service IP) which is the desired instead of changing IP address as it happens during failover in Multi instance deployment.  

    Also, all QMGRs under MSCS control have been configured with manual startup type.  

    BTW, when you refer to shared disk for the heartbeat,  you mean the quorum disk which is exclusively used for checking both cluster nodes health via private cluster connection (LAV). 

    Rgds, Nick.

     

       



    ------------------------------
    Nick Dakoronias
    ------------------------------



  • 9.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Tue September 26, 2023 11:32 AM
    Edited by Francois Brandelik Tue September 26, 2023 11:47 AM

    what I am trying to say is there is no shared drive in MQ => MSCS. 

    There are cluster controlled drives that are defined as cluster role resources and they are dependencies for the qmgr resource on the role, just like the VIP is a cluster role resource and a dependency.

    Shared drive meaning it is available on both sides at the same time. This is only for the MSCS cluster Heartbeat.

    Point 5 says clearly:

    Test the shared disk by using the MSCS Cluster Administrator to move it from one cluster node to the other and back again. 

    Talking about a shared disk is a misnomer comparing this to other mounts because, clearly, if you have to check the availability of the shared disk, it is not shared.

    The point here is the Shared Disk is shared TO the cluster but NOT WITHIN the cluster. It is a cluster resource that is assigned to the active node in the role... and should not be confused with the heartbeat storage that is on a drive shared within the cluster.

    Thanks

    ------------------------------
    Francois Brandelik
    ------------------------------



  • 10.  RE: New QMGR added to existing MQ Cluster resource / Installation Instance doesn't failover

    Posted Wed September 27, 2023 09:04 AM

    With you last post it seems that there are different perceptions or approaches related to the standard terminology imposed by vendors.

    At first, according to Microsoft, Cluster resources/resource types are physical or logical entities managed by MSCS/WSFC cluster service such as: DISK, IP, or File Share. Microsoft -who owns  the MSCS/WSFC technology-   uses specific terminology about Hardware requirements & storage options to be used in Failover Clustering. At hardware level Microsoft states that  physical disks (LUNs) mounted to shared storage (SAN) -accessible for read/write by all cluster nodes-  that are NTFS provisioned- can used and added as Shared Data Disks to existing MSCS/WFSC for Cluster (failover) purposes.  

    In MSCS/WSFC Failover Cluster Manager  the above Shared Data Disks appear as Cluster Disks (Volume Manager Disk Groups).

    So, according to Microsoft, Shared Drive is treated mean as shared storage data disk (LUN) which is read/write enabled for all nodes upon their ad-hoc request (required for Failover cluster validation step) rather than available on both sides AT THE SAME TIME which is obviously a precondition for the quorum (heartbeat /heath state check), it is also applicable on Active/Active cluster  deployments, but (the term SAME TIME) is not a precondition for an Active/Passive cluster, like in our case.

    Now regarding the bullet point #5 "Test the shared disk by using the MSCS Cluster Administrator to move it from one cluster node to the other and back again", this is listed in order to assure (cross-check) and secure (validate) that the shared disk resource -used to store new QM's data and log- can support  failover operations and ensure QM's availability against clients inbound connections.

    If you refer specifically to the term "Shared Disk", then indeed it is not the most appropriate naming convention,  given that it is actually a disk resource mounted to SAN (LUN)  assigned as clustered (MQ) MSCS resource for exclusive use by QMGR. 

    The subjected problem occurred right at the properties configuration of that resource.   

    Hoping is much more clear now.

    Rgds, Nick.



    ------------------------------
    Nick Dakoronias
    ------------------------------