View Only

Rebalancing Request Reply applications in a Uniform Cluster (new for 9.2.4)

By Anthony Beardsmore posted Mon November 22, 2021 09:21 AM


*** Note - see The IBM-Messaging GitHub repository for example scripts and code for some of the topics discussed in this post ***

If you've been following the new features rolled out in MQ9.1 CD releases and the MQ9.2 LTS you'll already be aware of Uniform Clusters, a new feature in MQ to make scaling and rebalancing client applications across a group of identical queue managers simpler and more powerful.

You'll also be aware that there are some restrictions or limitations on the types of application which can take advantage of this client balancing capability - when an application builds 'affinity' with a particular queue manager because of persistent state held there, it is not suitable for rebalancing in this manner.  Such applications can still connect to a UC queue manager, but must use MQCNO_RECONNECT_QMGR or no reconnect option at all to ensure that affinity is not lost.

In MQ 9.2.4 we've introduced new concepts which allow you to broaden the types of application which can take advantage of this rebalancing.  Vas gives a good summary of the basic new concepts and changes in his post here, and a good entry point to the formal documentation is here.  A particularly useful application type which this supports is - at least a subset - of applications using the very common 'Request Reply' pattern.   As I know this is something a lot of our users have been eagerly awaiting, I wanted to provide a deeper dive into this particular pattern and some of the nuances of using it in this article.

What is a 'Request Reply' application in this context?

A very common use case for MQ is to send a message to a particular - often remote - queue containing a 'request' of some kind.  This might for example trigger an update in a backing data store, or query some information, and provides the address of a 'replyToQueue'.  The application then waits for a response to arrive on that queue - an acknowledgement that the action has been performed, or the data queried.  MQ provides various fields and functionality which can assist with this (MD mechanisms for the reply destination, message types, correlation identifiers), but these are all optional and may be used in differing combinations or indeed not at all.  It is perfectly possible to build a 'request reply' application without MQ being aware that that is what you are doing - using pure datagram messages and managing all reply routing and correlation manually in the application code.

For this reason, Uniform Clusters make no assumption regarding the request reply pattern unless explicitly told that it is being used via the new Application Pattern options.  However, when you do tell MQ that you are using this pattern, it can be much smarter about when and how applications are rebalanced, allowing you to extend application rebalancing to applications which would not previously have been suitable.

What will MQ do with this information?

The aim of the new capabilities is to avoid moving a requesting application until it has received any outstanding responses.  Note that this will still not be suitable for all request-response application styles, but we hope will cover many of the more common use cases and allow you to make such clients reconnectable in a Uniform Cluster.  So what exactly changes when you 'flag' an application as following this pattern?  In very basic terms, the queue manager assumes that for every message PUT by the client, exactly one message in response should be received on a corresponding GET.  Applications will not be rebalanced while waiting for this response, unless a configurable timeout (10 seconds by default) is exceeded.

While that sounds simple enough on the surface, there are actually many nuances to this worth thinking through before enabling an application for balancing using this pattern.  The details are covered in the MQ Docs, but I thought it would be useful to put together a handy 'cheat sheet' here of key things you may want to consider:

  • Application complexity / PUT and GET counts

    Requests and responses across the entire Application Instance are considered, so a relatively complex application with multiple connections and/or threads will function 'correctly' in the sense that it will not reconnect for balancing purposes until all PUTs are met with a corresponding GET.  However, note that this means if there are never 'pauses' in the application design - points where it is likely that no requester is currently waiting for a response - then the application will never reach a convenient point to move and will always end up staying connected to the same queue manager for as long as it can (and then being 'rudely' reconnected when the timeout is reached).  If such pauses in processing are expected, but only at particular intervals, you may wish to extend the default timeout to allow for this.
  • 'Non request/response' messages.

    Any messages received when there are no outstanding 'PUT' requests are ignored for rebalancing purposes.  So for example, you could 'GET' several messages when the application starts to provide 'configuration' information to the application, without affecting the rebalancing.  However, ALL sent messages are assumed to be requests, expecting a response.  If the application intermittently sends messages which are not truly requests, it may not be suitable for this pattern (though it may be possible to work around this using message expiry, see below).
  • Message Persistence

    If a request or response message is 'dropped' somewhere along the round trip (by either MQ or a processing application), then again of course the application will not move until timeout is reached (because it will wait 'forever' for a matching response.)  If the application design, or use of non-persistent messages, means that such 'dropped' requests are likely, then consider using message expiry to 'free' the client to move after a reasonable period.
  • Message Expiry

    However many requests have been sent, if responses are not being received the rebalancing logic will only wait until the most recent request message has expired before 'giving up' and considering the client eligible to move again.  Use of expiry in request reply applications is very common to avoid requester applications hanging indefinitely for a response, and where expiry has been used appropriately this is likely to allow for much more responsive and even application spread across the uniform cluster.  Note that in these situations, it is probably appropriate to tune the 'timeout' interval for the application to be at least as long as the 'expiry' set by the application, so that rebalancing does not unduly truncate the expiry period.
    If an application does not quite fit the prescribed pattern of 'single request to single response' and does have to send messages which are not truly requests, then if these can be sent with relatively short expiry interval this may also allow them to still be considered for rebalancing in a Uniform Cluster.
  • Application deployment and routing

    This may seem stating the obvious, as it is important when considering any application for any Cluster deployment, but perhaps becomes even more so when request reply pattern in a Uniform Cluster is enabled.  By enabling a requester application to move around the cluster in this manner, in most cases it is likely to be vital that replyToQueue and replyToQueueManager information is being configured correctly to enable routing of the responses.  It may also increase the importance of having 'enough' responder applications in the cluster - for example if responses are to be processed on the local queue manager, each queue manager must have such an instance.  A usual pattern in a Uniform cluster deployment in any case would be to have at least as many responder applications as queue managers in the cluster.

Who can use this?

As discussed in Vas' blog 'Opting in' to the new application pattern support is entirely configured at the client side (either programmatically or in deployment configuration) so to use the new request response pattern does require both client and server to be running MQ 9.2.4 or higher.  Support is currently limited to MQI and .net clients at these levels.

Summary - Further Thoughts

With some consideration given to the areas above, we are hoping that the new features in 9.2.4 will allow many more applications to take advantage of rebalancing and moving towards an 'Active-Active' MQ topology for availability and scaling.  We are looking forward to hearing from you as to which applications you are and aren't able to move in this direction, and thoughts on additional patterns, capabilities, or client environments which would be valuable to you.  As usual please feel free to get in touch either here through the Community (comments on the blog, forums), the MQ RFE process, or any other route you prefer.