View Only

Cloud native high availability with IBM MQ

By David Ware posted Thu March 25, 2021 05:45 AM


Obviously, it's all about the messages with IBM MQ, if you can't access them when you need them you have a problem. So these are very exciting times for MQ, for the first time we're building data resiliency and high availability right into the very heart of the MQ runtime, the queue manager. This fully maximises the availability of each and every critical message. This is something we've traditionally depended on external mechanisms to help with, even when we've provided them with MQ itself.

This is another mayor step along the path of providing a truly cloud native messaging service, on top of its active/active Uniform Cluster topologies and the certified container. Letting you benefit from MQ's unique abilities in a way you'd expect in the cloud.
We're calling this new approach Native HA and it is designed around efficient and secure replication of data for redundancy, integrated with quorum controlled fail over for safe and fast recovery from failures. Critically, this provides the same data integrity and consistency that you'd expect from MQ, with full protection for every last recoverable operation and message, ensuring none are duplicated or lost. And this is all without needing any changes to your applications or restrictions in their behaviour. 

We’ve just released IBM MQ 9.2.2 CD (March 2021) and this contains an early release of the feature when you're running MQ's Certified Container with IBM Cloud Pak for Integration on OpenShift. So something to get a feel for, even if you can't run your critical systems on it just yet.

UPDATE: Since writing this for that early release in 9.2.2 we've gone on to productise Native HA, both since 9.2.3 for OpenShift and since 9.2.4, for other Kubernetes environments.

So what is Native HA?

Simply put, a Native HA queue manager consists of three instances, each with their own persistent storage that MQ will keep in sync (that’s the critical bit!). The queue manager automatically elects one of those instances to be where the applications connects to send and receive messages, this is the active instance. Any recoverable operation, like putting a persistent message onto a queue, is synchronously replicated from there, across the other instances, and from there into the multiple sets of storage for safe keeping in the event of a failure.

As well as the data replication, the really important thing is that the instances of the Native HA queue manager are working in quorum to automatically detect a problem with the active instance and safely elect another instance to take over, within a few seconds. This means there's no need for anything external to be in place to either detect a problem or instigate the fail over, keeping the solution very easy to deploy and manage.

Keeping storage simple

Running a highly available queue manager, even in a container is nothing new, you can do it today. In fact the MQ Operator makes it very easy to deploy a multi-instance queue manager into OpenShift. However, that needs to have access to the right type of file system, one that provides redundancy and availability in the environment you're running and in a way that fits with MQ's requirements, read write many (RWX) with lease locking for example.

Native HA is a great fit for cloud environments, especially containers, where storage may not meet those exact needs, especially across multiple availability zones, which is critical for the very highest availability.

With MQ taking on the replication and availability there is no need to track down the perfect storage option that provides the levels of resiliency that you require and the precise behaviour that MQ depends on to achieve integrity and high availability using a multi-instance queue manager today.

With Native HA you can choose simple and efficient read write once (RWO) block storage. If anything goes wrong you're safe in the knowledge that there are now three exact copies of the messages and that any one of those three instances can quickly pick up where the active one stopped. And we're not talking about approximately where it stopped, we're meaning exactly where it stopped, down to a specific message or transactional operation due to the replication being synchronised across replicas in a consistent way. So no risk of message loss or duplication.

You now have a choice, continue reading, or first break out to see a demo of it in action here.

So how does it work?

You don't really need to know how the internals work, it's all being done for you, but if you're reading this you're probably a little interested...

We could have chosen the easy route on paper, and coupled MQ with external, replicated persistence, or at least a separate runtime to manage the monitoring and fail over, ensuring consistency and availability. But we haven't done that. Instead, as I mentioned, we're driving this right into the core of MQ's logic.

We haven't chosen to do it this way for fun, or the technical challenge, we've done it for two very good reasons. First, it's the only real way to build a truly cloud native solution that can match our user's exacting performance and availability requirements in these environments. And secondly, because we realised that it fits very nicely into the way MQ works. In fact, due to MQ's original core design principle, embedding it became the obvious choice.

That core design principle is the recovery log at the heart of every MQ queue manager. The concept that has led MQ to be trusted across the globe to look after billions of the the most critical of messages every second of every day.


Write-ahead recovery log


Hands up if you're familiar with MQ's write-ahead recovery log? MQ ensures no recoverable data is lost following a failure based on this log. Everything that needs to be there following a restart/failure goes through that log first, even configuration changes. So whenever you put a persistent message to a queue or a subscription it is first appended to this log. And whenever you consume those messages, that state change is also appended to the log. When you need complete assurance that nothing can be lost, like committing a transaction, MQ ensures that those log updates are forced out to the storage layer. Once we know we have that safely persisted we can lazily update that same state elsewhere, such as in the queue files. And we know that if anything fails in between we can go back to that recovery log to replay those last few updates to get back to exactly where we were.

It sounds simple, and it is conceptually, but it also allows MQ to be highly optimised and efficient, benefiting from funneling all the state changes across multiple queues and topics for a queue manager through a single set of logs. This removes the need to make individual writes or forces of that data for individual queues and topics, which would be much less efficient. This is how MQ can achieve such high throughput whilst not compromising on the assurances around delivery or protection from dropped or duplicated messages, right down to individual messaging operations. In fact the more work you push through a queue manager in parallel, the more efficient it typically becomes thanks to the sophistication of MQ's recovery log.

So as you can see, it's the recovery log held on disk that is actually the heart of a queue manager. So if you can protect that, your queue manager and all the messages flowing through it are safe too.



Now hands up who’s heard of consensus algorithms? In fact, one in particular, Raft?

This is a way to ensure state can be replicated across a set of nodes while making sure that they all hold the same view of the world. This removes the problem of having just a single copy of your critical state, which is too easy to lose in the event of a failure. It also works on the principal that one of those nodes is made the leader, that’s where the updates are made, with the others being the followers.

To ensure consistency, that leader needs to achieve consensus across more than half of the nodes before it can be sure that any state change is agreed and different parties don't start to believe different things. This ensures that if the leader fails, one of the followers can take over, in the knowledge that the new leader has exactly the same view of the world as the failed leader.

Raft even tackles the problem of ensuring that there is at most one leader at any one time. This prevents the dreaded split-brain scenario where multiple nodes think they're in charge and make conflicting updates simultaneously.

That might all be interesting (or not), but how does all this relate to Native HA? Well, at the heart of Raft is the principal of sequential logs that you drive state changes through. The leader writes each state change to the log and replicates this out to the other instances, efficiently obtaining consensus across them in the process. So you end up with multiple logs on different systems, all in agreement with each other.

As I've just described, at the center of MQ is the write-ahead recovery log where we drive all state changes through....we have a perfect match!

And to make this even more suitable, the authors of Raft set out with the objective to make it understandable and easy to implement. And that should not be underestimated when you’re integrating this into some of the most critical messaging systems in the world! This is why we've been able to build the Raft logic right into the heart of the MQ recovery log processing without disrupting what's there today.

As we append an update to the recovery log on the active instance of a queue manager (the leader) we replicate that update out to multiple replica instances of that queue manager (the followers). Only once consensus is reached the leader can acknowledge the update back to the application, say on a transaction commit, just as it would once the file system confirms the update has been forced out to storage. Meanwhile, as the followers also now have the updates to the recovery log they can apply those updates to their version of the queue manager state, such as the queue files. All in the knowledge that these will match exactly to what the leader has done, so they'll be ready to  take over in the event of a failure.

And with the health checking logic that we're putting in place, and the speed with which a replica instance of a queue manager can be fully operational, the queue manager is only momentarily out of action. Typically only a few seconds. 

What does this mean from an application's point of view? Not a lot. Just like the HA options that already exist, the only difference is that if the leader fails, the queue manager may be restarted from a different location. And in an environment like Kubernetes even that change in location is hidden from them. So critically it just looks exactly the same as if it had stopped and started in place, with exactly the same configuration and the same set of persistent messages.

So there's no list of caveats of what does and doesn't get replicated, or what will or won't work. And importantly, these Native HA queue managers can just as easily integrate with other queue managers, wherever they're located. Whether that's a loose federation of queue managers for integrating systems across different platforms and locations or a set of queue managers working together to form an active/active Uniform Cluster for the ultimate always on messaging solution.


So how do I turn this on?

With the delivery of IBM MQ 9.2.2 CD and Cloud Pak for Integration 2021.1.1 we have the first early release of Native HA out in the wild. As I said earlier, it's not quite ready for production yet,  but the user experience is just about ready. Using the MQ Operator with Cloud Pak for Integration, you simply select Native HA as the availability option when you deploy a queue manage. MQ takes care of deploying everything that is needed. That's really it, a single line of YAML, what could be simpler?


Next steps

In summary, high availability has never been so critical, and it's never been this simple to achieve with MQ, in such a portable and cloud native form. So if you've been holding back from exploring MQ in cloud or in containers, now is the time to really take a look.

But finally a thank you to all the IBM MQ teams, many of whom have been working tirelessly to make this happen, and continue to do so! And also a shout out to the authors of Raft, we couldn't have done it without you!


You're welcome to jump straight to the documentation on how to deploy a Native HA queue manager in Cloud Pak for Integration, but I’ve mentioned quite a few different things during this so here are a few useful links if you’re interested.