MQ

 View Only

Special considerations when using JMS Request-Response applications in a Uniform Cluster

By Simone Jain posted Wed January 24, 2024 04:53 AM

  

You may have noticed that IBM MQ 9.3.4 extended the control over rebalancing available to application developers and administrators when using the JMS Classes for MQ.  I wanted to highlight a particular scenario developers should be aware of when using the request-response controls now available to JMS Applications.

Before we get on to that, it would be helpful to know about the request-response messaging pattern and uniform clustering.

By default, a request-response application connected to a uniform cluster will not be available for reconnection until the number of PUT and GET operations executed by it are equal. This is an effective rule and usually ensures that the requestor receives responses as expected. However, there is an edge case developers need to be aware of which we’ll explore now, alongside ways to mitigate this behaviour. 

The Edge Case

Why might a request-response application be moved before the conversation is completed? Let’s consider a typical scenario where a request-response pair are exchanging multiple messages: 

1. A requesting application creates a temporary dynamic queue on the queue manager it is currently connected to. 

2. The requestor creates a request message and populates the ‘reply-to’ property with the address of the temporary destination created in step 1. 

3. The requestor then puts the request on to a queue for the responding application to consume. 

4. The responding application processes the request message and puts responses to the temporary queue referenced in the request message.

During this exchange, there is a small window where the requesting application will prepare a message but is reconnected to another queue manager before it can send it. This occurs because JMS messages actually embed the reply-to destination as a message property (unlike MQMD fields which are populated by the queue manager as the message is queued).

1. Since the message was created before the reconnection, the address it contains for the ‘reply-to’ queue is for that on the first queue manager. 

2. When the requestor is reconnected to the second queue manager, it creates a new ‘reply-to’ queue.

3. It then puts its message on to a queue for the responder to consume. 

4. The responder puts its reply on to the ‘reply-to’ queue specified in the message, which is the one on the old queue manager. 

5. The requestor is left waiting indefinitely at the new ‘reply-to’ queue for its response that it created in the new queue manager.

This animation illustrates the described situation:

The Solution

There are potential improvements to the MQ JMS Client which would mean the application does not need to be aware of this pitfall, and we hope to address these in a future release.  In the meantime, how can we minimise the risk and impact of this edge case?

The first recommendation where possible is to design applications such that ‘missed’ replies do not cause a permanent block in application processing.  A typical way to achieve this would be to include an expiry time on request messages. After the specified expiry time, messages are deleted automatically by the queue manager. This is immediately useful because it prevents queues becoming full if they are inundated with requests. Furthermore, it is usual best practice for the responder to copy the remaining expiry time from the request message to the response message. The effect of this is that if the responder does send its response to the wrong temporary queue, it will again be deleted once the expiry time has elapsed. Requestor applications which do not see a timely response to their messages can therefore safely issue a new request, knowing that any previous response will not be processed twice.

Another recommendation would be to always configure a timeout after which request/reply status will be ignored for balancing purposes. In the JMS classes, this is configured via WMQConstants.WMQ_BALANCING_TIMEOUT and defaults to ten seconds. The benefit in this scenario is that ‘misdirected’ responses will not cause the application to be tied to a single queue manager indefinitely, and the uniform cluster is again able to reconnect it to maintain balance across the cluster as required.

0 comments
24 views

Permalink