App Connect

App Connect

Join this online user group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only

Unlocking new DR scenarios for IBM App Connect powered by IBM MQ Native HA Cross Region Replication

By Matt Roberts posted Thu March 27, 2025 01:20 PM

  

The exciting new IBM MQ Native HA Cross Region Replication (CRR) feature that has been released recently as part of IBM MQ v9.4.2 provides a built-in solution for replication of MQ data to a second region or data center, which enables a range of powerful disaster recovery and migration scenarios.

Alongside the benefits this brings to your own applications connecting to IBM MQ, Cross Region Replication unlocks new styles of disaster recovery configuration for scenarios involving MQ and IBM App Connect, which are discussed further below!

New Disaster Recovery scenarios for ACE with MQ

IBM App Connect Enterprise (ACE) flows are typically stateless, but any meaningful business application depends on interaction with external state. This interaction is often implemented in the form of event driven flows where IBM MQ messages containing the business state are used to initiate processing in ACE, or where the result of an ACE flow is output to an IBM MQ message for processing by further downstream applications.

Until now scenarios involving MQ in containers (commonly used in conjunction with ACE) have been restricted in the scope of their possible Disaster Recovery implementations because an IBM MQ queue manager exists within a single OpenShift or Kubernetes cluster, and a Kube/OpenShift cluster in turn is scoped to exist within a single region (or group of closely connected data centers). The problem with this is that in the event of a total cluster or region outage, the message data stored in IBM MQ queues is unavailable until that region can be recovered. This may interrupt in-flight transactions for hours or days – and in extreme cases if the entire region is irrevocably lost then so is the message data!  Technologies like IBM MQ Replication Data Queue Manager (RDQM) address this problem for VM-based deployments but are not suitable for container-based scenarios.

However, with IBM MQ Native HA Cross Region Replication we now have the ability to configure asynchronous replication of MQ state data to a deployment in a different region so that the queue manager data can be recovered in the event of an unplanned or planned outage.

How does it work?

Looking first at the scenario for high availability withing a single region, an IBM MQ Native HA queue manager consists of three replicas to provide resilience to failures within a cluster such as a worker node outage. One of the MQ replicas is the active queue manager and it’s this instance to which App Connect Enterprise flows (and other types of application) connect to send and receive messages.

High availability of the App Connect Enterprise flows is also achieved by configuring multiple replicas of the ACE deployment, and each of those replicas connects to the active queue manager instance as shown in the diagram below:

Figure 1: High availability topology within a single region (before cross region replication)

An IBM MQ Native HA Cross Region Replication topology extends this single-region scenario by enabling a pair of IBM MQ Native HA instances to join together – one “Live group” running in the first region which serves the active application traffic, and which also asynchronously replicates data to a “Recovery group” running in the second region.

One Native HA replica in the Recovery group is designated the “Recovery Leader” – and is responsible for receiving the transmitted state data from the active replica in the Live group. Applications such as ACE cannot connect to the Recovery group – it is only enabled to receive the replicated state data.

What does this mean for App Connect Enterprise?

In the event of a planned or un-planned failure an administrative action must be carried out to instruct the MQ Recovery group to become active. An effect of this is to make it possible for applications such as ACE to connect to the MQ instance in the second region (which was not possible while it was the Recovery instance).

To provide disaster recovery for our ACE flows we need to have an equivalent set of replicas for those flows configured in the second region so that they can connect to the newly-live queue manager instance and business processing can continue. However, since applications cannot connect to the MQ Recovery group queue manager until the failover takes place, we need to configure ACE so that the flows do not start up until after the failover completes – otherwise the ACE flows in the second region would keep restarting themselves indefinitely while waiting for the queue manager to become available.

There are two ways to configure ACE so that the integration server does not try to start up until the queue manager failover has taken place, both of which are actions that can be included as part of the steps the administrator takes to trigger the failover of the active MQ instance to the second region:

  1. [preferred] Set “spec.desiredRunState: stopped” initially in the custom resource (CR) of the ACE flow deployment:
    • This is a new option introduced as part of the ACE operator v12.9.0 (and integration runtime v13.0.2.2-r1) so is only available if you are using that version or later
    • The effect of this option is to scale the number of flow replicas down to 0 so that no CPU or memory resources are being consumed by the deployment, and it is not trying to connect to the queue manager
    • When the queue manager failover to the second region takes place you administratively update the CR to “spec.desiredRunState: running” to cause the pods to start up
    • The advantages of this approach are that the details of the required number of replicas are always retained in the custom resource (unlike option 2 below), and that there is a more natural user action in the App Connect Dashboard user interface to stop/start the deployment if you choose to use the UI
  2. Set “spec.replicas: 0” initially in the custom resource (CR) of the ACE flow deployment:
    • This option works for all in-support version of ACE containers, so can be used even if you are not using the latest v12.9.0 operator version
    • The effect of this option is the same as the option 2; the number of flow replicas is scaled down to 0 so that no CPU or memory resources are being consumed by the deployment, and it is not trying to connect to the queue manager
    • The drawback of this option is that you have overwritten the desired number of replicas to 0 to prevent the runtime from starting up, so you need to know what the correct number of replicas for this particular flow is at the point when you come to enable the deployment (e.g. by setting “spec.replicas: 3” or whatever the appropriate number is for your specific workload requirement), as opposed to just setting it to “running” in the first option

The following diagram illustrates the cross-region replication topology before the failover to the second region:

Figure 2: Cross region replication topology - before failover to second region

Triggering the regional failover

The switchover of the active MQ queue manager from the first to the second region is achieved by administrative actions depending on whether this is a planned or an unplanned switchover.

Planned switchover

In a planned switchover, where the first region is still running and contactable:

  1. In the second region, update the MQ custom resource configuration from “nativeHAGroups.local.role: Recovery” to “nativeHAGroups.local.role: Live
  2. In the first region, update the MQ custom resource configuration from “nativeHAGroups.local.role: Live” to “nativeHAGroups.local.role: Recovery
    • Once these two steps are complete, and the data is in sync between the instances, the original recovery MQ instance will become the live instance.
  3. Maintain efficient use of resources in the first region by setting the ACE configuration to “spec.desiredRunState: stopped”.
    • This will prevent the ACE flows from continually trying to connect to the queue manager in the first region that is no longer running
  4. Finally in the second region update the ACE configuration to set “spec.desiredRunState: running”, which will cause the ACE flows to start up and connect to the new queue manager

Unplanned switchover

In an unplanned switchover where the first region is offline – for example due to an unexpected outage – the steps are slightly different, to account for the first region not being available:

  1. [same] In the second region, update the MQ custom resource configuration from “nativeHAGroups.local.role: Recovery” to “nativeHAGroups.local.role: Live
  2. [new] Also in the second region, set “nativeHAGroups.remotes.enabled: false
    • This will allow the recovery instance to become live without trying to coordinate with the first region (which is unavailable)
  3. [same] Finally in the second region update the ACE configuration to set “spec.desiredRunState: running”, which will cause the ACE flows to start up and connect to the new queue manager

Note that in this case we don’t make any updates to the ACE or MQ configuration in the first region because it is offline.

The diagram below illustrates this unplanned scenario:

Figure 3: Cross region replication topology - triggering failover to second region, following an unplanned failure

Once these administrative actions have been carried out both IBM MQ and ACE are running in a resilient fashion in the second region, with access to the same MQ message state that was present in the first region prior to the failover, and business processing can continue as normal.

Summary

In this post I’ve described how the exciting new IBM MQ Native HA Cross Region Replication (CRR) feature unlocks new Disaster Recovery scenarios for IBM App Connect when running with IBM MQ, enabling you to build more resilient and effective integration solutions that enable your business continuity in the event of even a full regional outage.

Note that as a user of IBM App Connect Enterprise your ACE license has always included a restricted entitlement which permits you to use IBM MQ under your ACE license for scenarios where the only interaction with IBM MQ is through ACE flows. For scenarios where you want your own applications to interact with IBM MQ you must purchase direct entitlement for MQ through either IBM MQ or Cloud Pak for Integration licenses.

Thank you for reading, and I look forward to hearing your comments below!

Matt Roberts
Distinguished Engineer and CTO, IBM Integration

References

0 comments
25 views

Permalink