MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

View Only

Back to discussions

Expand all | Collapse all

RDQM DR : DR status Partitioned

Bharat PuriSun June 09, 2024 10:24 PM

Hi All, Hope you all doing well. I require assistance with setting up RDQM DR, where I'm conducting ...

Morag HughsonSun June 09, 2024 11:19 PM

Hi Bharat, Since you say, "[here] is status on .. NODE_B after making it primary" I am assuming that ...

1. RDQM DR : DR status Partitioned

Like
Bharat Puri
Posted Sun June 09, 2024 10:24 PM

Reply
Hi All,

Hope you all doing well.

I require assistance with setting up RDQM DR, where I'm conducting a test in a POC environment to eventually implement it in the live environment.

I'm utilizing MQ 9.2.06 on an RHEL 8.7 machine hosted on AWS and have ensured all traffic between the servers is enabled.

During controlled switchovers from Node A to Node B by halting the QM and switching primary to secondary and vice versa, I encounter no issues.

However, during aggressive shutdown tests on the primary server to simulate actual DR scenarios, I can successfully promote the secondary to primary and start the Queue Manager.

The problem arises when attempting to bring the stopped Node back up and designate it as secondary. At this point, I observe the replication status as 'Partitioned', and the status of the secondary server on the other Node becomes unavailable.

I recognize this as a split brain issue and have followed the steps outlined in the provided link, yet I haven't found a resolution. Any assistance would be appreciated.

https://www.ibm.com/docs/en/ibm-mq/9.1?topic=oidre-resolving-partitioned-split-brain-problem-in-dr-rdqm

NODE A: 172.31.42.66
NODE B: 172.31.34.7

After shuting down 172.31.42.66(NODE_A) Machine , below is status on 172.31.34.7(NODE_B) after making it primary

[root@ip-172-31-34-7 ~]# rdqmstatus -m RDQM_DR
Node:
ip-172-31-34-7.ap-south-1.compute.internal
Queue manager status: Running
CPU: 0.01%
Memory: 104MB
Queue manager file system: 49MB used, 2.9GB allocated [2%]
DR role: Primary
DR status: Partitioned
DR type: Synchronous
DR port: 7000
DR local IP address: 172.31.34.7
DR remote IP address: 172.31.42.66
DR out of sync data: 696KB
DR last in sync: 2024-06-09 11:56:38
[root@ip-172-31-34-7 ~]# rdqmstatus
Node:
ip-172-31-34-7.ap-south-1.compute.internal
OS kernel version: 4.18.0-425.19.2
DRBD OS kernel version: 4.18.0-425.10.1
DRBD version: 9.1.12
DRBD kernel module status: Loaded

Queue manager name: RDQM_DR
Queue manager status: Running
DR role: Primary
DR status: Partitioned

After starting the failed NODE_A :172.31.42.66

[root@ip-172-31-34-7 ~]# rdqmstatus -m RDQM_DR
Node:
ip-172-31-34-7.ap-south-1.compute.internal
Queue manager status: Running
CPU: 0.01%
Memory: 104MB
Queue manager file system: 49MB used, 2.9GB allocated [2%]
DR role: Primary
DR status: Partitioned
DR type: Synchronous
DR port: 7000
DR local IP address: 172.31.34.7
DR remote IP address: 172.31.42.66
DR out of sync data: 696KB
DR last in sync: 2024-06-09 11:56:38
[root@ip-172-31-34-7 ~]# rdqmstatus
Node:
ip-172-31-34-7.ap-south-1.compute.internal
OS kernel version: 4.18.0-425.19.2
DRBD OS kernel version: 4.18.0-425.10.1
DRBD version: 9.1.12
DRBD kernel module status: Loaded

Queue manager name: RDQM_DR
Queue manager status: Running
DR role: Primary
DR status: Partitioned

Changing NODE_A (172.31.42.6) state to secondary

[root@ip-172-31-42-66 ~]# rdqmstatus -m RDQM_DR
Node:
ip-172-31-42-66.ap-south-1.compute.internal
Queue manager status: Ended immediately
DR role: Secondary
DR status: Remote unavailable
DR type: Synchronous
DR port: 7000
DR local IP address: 172.31.42.66
DR remote IP address: 172.31.34.7
DR out of sync data: 28672KB
DR last in sync: 2024-06-09 11:58:06
[root@ip-172-31-42-66 ~]# rdqmstatus
Node:
ip-172-31-42-66.ap-south-1.compute.internal
OS kernel version: 4.18.0-425.19.2
DRBD OS kernel version: 4.18.0-425.10.1
DRBD version: 9.1.12
DRBD kernel module status: Loaded

Queue manager name: RDQM_DR
Queue manager status: Ended immediately
DR role: Secondary
DR status: Remote unavailable

------------------------------
Regards,
Bharat Puri
Infrastructure Architect(IBM/Kyndryl)
------------------------------
2. RE: RDQM DR : DR status Partitioned

Like
Morag Hughson

IBM Champion
Posted Sun June 09, 2024 11:19 PM

Reply
Hi Bharat,

Since you say, "[here] is status on .. NODE_B after making it primary" I am assuming that you have decided to keep the data on NODE_B.

You show rdqmstatus output which shows that the queue manager is running. You also say that you have followed the instructions in the linked webpage.

You don't mention anything about the synchronisation, nor do you show any rdqmstatus output during the synchronisation.

So to be clear, are you saying that you have following this set of steps:-

Ensure both queue manager instances are stopped.

Specify that the queue manager on NODE_A is the secondary:

rdqmdr -m RDQM_DR -s

Specify that the queue manager on NODE_B is the primary:

rdqmdr -m RDQM_DR -p

Synchronization begins, with the data from the queue manager on the main node being copied to the recovery node.

Check the status of the synchronization:

rdqmstatus -m RDQM_DR

When the synchronization is complete, start the queue manager on the main node:

strmqm RDQM_DR

Can you tell us what happened while the data was being synchronised?

Can you confirm that you followed ALL of the above steps?

Cheers,
Morag

------------------------------
Morag Hughson
MQ Technical Education Specialist
MQGem Software Limited
Website: https://www.mqgem.com
------------------------------

Original Message

MQ

MQ

RDQM DR : DR status Partitioned

Bharat PuriSun June 09, 2024 10:24 PM

Morag HughsonSun June 09, 2024 11:19 PM

1. RDQM DR : DR status Partitioned

2. RE: RDQM DR : DR status Partitioned

Additional
Resources

Office

Quick Links

MQ

MQ

RDQM DR : DR status Partitioned

Bharat PuriSun June 09, 2024 10:24 PM

Morag HughsonSun June 09, 2024 11:19 PM

1. RDQM DR : DR status Partitioned

2. RE: RDQM DR : DR status Partitioned

Additional Resources

Office

Quick Links

Additional
Resources