One thing that IBM MQ cannot be accused of is being short of features that provide or facilitate high availability (HA) and disaster recovery (DR) solutions.
IBM MQ 9.4.2 was released on 27th February 2025 and introduced our latest feature, Native HA Cross-region replication. We have extended Native HA capabilities to permit asynchronous replication of log data supporting unplanned failover and planned switchover. Of course, you didn't need me to tell you that, you already knew this from my colleague Jean de Garrigues' post and from watching the video that explains and demonstrates this great new feature.
Cross-region replication is an important evolution of the Native HA feature, providing integrated HA & DR capabilities with just simple storage requirements and without any 3rd party dependencies. Much like the Native HA group that accepts application workload, a Cross-region replication recovery group is also formed of 3 x replicated instances, so is itself highly available. Deploying a Native HA group with either a live or recovery role couldn't be simpler either using the MQ Operator or using the sample helm charts.
So why might you want to use Cross-region replication? It's just for "doing DR" right ? Wrong.
Whilst the most obvious use of Cross-region replication is indeed to provide an out-of-region DR solution, this feature isn't limited to only being useful in a "disaster" or between regions. Let's imagine you need to move a queue manager you've already deployed in a cloud and you need to migrate it to a new cluster or alternate cloud provider. How would you move the queue manager retaining the existing message data with minimal impact to service availability? Cross-region replication can simplify that move/migration with a few simple steps;
- Deploy a new queue manager group with a Recovery role.
- Reconfigure the existing queue manager to replicate to the newly created group, whilst continuing to process new application work.
- Swap round the Live and Recovery roles of both groups to perform a planned switchover.
MQ availability is often measured in terms of message and service availability. I want to reflect on choosing the "right" availability solution for MQ as the importance of message and/or service availability is subjective and is likely to be heavily influenced by application requirements.
It sounds obvious to say it, but a messaging service is only considered to be available if applications can connect to a message broker (i.e. queue manager) and perform messaging. If a message has been put to a queue and the consuming application is able to connect and retrieve that message, the message is considered available. Should a critical resource fail (disk/network/broker/etc) or be taken down for maintenance that could prevent access to queues and messages. It is critically important for message availability for the messaging system to be able to restore some access path for the consuming applications to be able to get to that message, in this context it matters less that an alternate service may be available.
Applications that demand very high rates of non-persistent message throughput, particularly where those messages only have a transactional value to a business for just a few seconds, are likely to benefit significantly from enhancing their service availability than compared to message availability. Two (or more) message brokers that provide a messaging service in an active-active architecture can be used to increase service availabililty and provide horizontally scaling capacity, often with a network load balancer routing the client application traffic. Another popular solution to implement an active-active architecture is to use IBM MQ Uniform clusters, this can rebalance client applications dynamically according to service demand and resource availability in a way that a network load balancer could not.
Applications that handle persistent messages are likely to benefit from increased message availability. The general principle that IBM MQ uses to improve persistent message availability is to detect and recover from failure as quickly as possible - e.g. Automatic client reconnect, Multi-instance queue manager, RDQM and of course Native HA.
In conclusion, there is no one-size fits all on what is the "best" HA solution. Message and service availability hinge upon individual application requirements and so the right availability solution for one application might not be the same solution for another.
For ultimate message and service availability, consider combining the benefits of automatic client reconnect to a Uniform cluster of Native HA queue managers with Cross-region replication. With IBM MQ's broad range of capabilities, you don't have to choose between just message or service availability, you can have your cake and eat it.