MQ

MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
Expand all | Collapse all

Request Reply model message flow, how to capture put timestamp and correlation ID

  • 1.  Request Reply model message flow, how to capture put timestamp and correlation ID

    Posted Fri January 12, 2024 01:40 PM

    I have an internal application that PUTs a request to a REMOTEQ queue "QA" which MQ delivers to an external partner's MQ qmgr.

    The external partner builds a response and sends it back to the originating qmgr in my environment, to a LOCALQ "QB".

    We are seeing latency in the network between the qmgrs.

    How can I capture the app's message's PUT timestamp to the transmission queue on my qmgr, and the correlation ID in the msg header?

    I also want to capture the PUT timestamp and correlation ID on "QB" on my qmgr as well, so I can math up the correlation IDs and show the end to end message flow time.

    I need this to prove MQ is not the source of the latency and convince network engineers to start collecting packet captures along the network path in order to find the culprit device that is causing the intermittent latency being experienced.

    I am reading through strmqtrc but do not see clearly which options might capture the data I need.

    I am willing to right an exit but not sure which exit I should select to write.



    ------------------------------
    David Pearson
    ------------------------------


  • 2.  RE: Request Reply model message flow, how to capture put timestamp and correlation ID

    Posted Fri January 12, 2024 10:13 PM

    Hi David,

    There are various parts of IBM MQ that already capture and do the maths you mention on your behalf. Hopefully they will provide you with the necessary information without needing to go to the lengths of writing something yourself. Certainly worth taking a look at before you launch into writing something yourself.

    Channel Status Network Time

    The NETTIME attribute shown on DISPLAY CHSTATUS shows the time spent only in the network - i.e. the maths it does is to take timestamps at the boundaries between MQ code and calling TCP calls and makes the calculations to show only the time spent in the network.

    To collect this attribute, you must turn on MONCHL on the channels (and restart them to pick up the change). Suggest you monitor the times on the slow network and some others that work well to compare the differences. Or if, as you suggest, the slowness is occasional, get used to what the "normal" number is, and spot the differences when things slow down. Channel Statistics also captures this information (with a Min/Max/Average over each collected interval) so you might find that a more useful way to get some longer term view of it and then graph the numbers to be able to see the slow downs - a picture often helps you to understand data when otherwise there are two many numbers to get your head around.

    Trace Route Messages

    You can use the IBM MQ feature, Trace Route Messages (which I recently wrote about here). The raw data gives you the various time stamps you might be seeking - an example of the raw data is show about half way down the referenced blog post in the section entitled "Report Options". Take a look.

    However, rather than writing your own application to the maths, you might be interested to know that there are already applications that do this for you. Here's a quick screen grab from MO71, showing the results of said calculations (for the channel portion and for each hop).

    As described in the referenced blog post, there are other tools that manipulate the output of trace route messages as well, so if you are not interested in using MO71, then of course there are others. I am not familiar enough with them to know whether the do these types of calculations, but they would at least get you the raw numbers.

    Hope this helps.

    Cheers,
    Morag



    ------------------------------
    Morag Hughson
    MQ Technical Education Specialist
    MQGem Software Limited
    Website: https://www.mqgem.com
    ------------------------------



  • 3.  RE: Request Reply model message flow, how to capture put timestamp and correlation ID

    Posted Tue January 16, 2024 02:00 PM

    Thank you Morag, for your responses and guidance.

    Yes I am trying to avoid writing something may well already be written that I could just reuse.

    Channel Status Network Time does not provide the detail I need since it is "collected over an interval of time".  This message flow is HIGH volume so a 1-2% occurrence of long time response averages out over the time interval and thus no detail given that would ultimately point to a root cause.

    I need to capture individual events of a response that comes back +10 seconds which is the amount of time the poling application poles for a response with the corresponding correlation ID.

    I believe tracing would provide the individual detail of each message flow, but also provide lots of excess data I would need to sift through.

    I asked IBM about what options I could use, and they pointed me to the documentation which describes the trace parameters but in VERY VAGUE specifics about what is captured.  This leaves me using my test environment, taking traces and hands-on learning about what each option captures and eventually identifying an option that collects what I need.  Tedious, but doable.  But since this is a production outage issue, my employer stresses the need for a root cause identification more immediately than waiting for me to learn the options of MQ trace.

    I will study the MO71 in more depth.  Thank you.



    ------------------------------
    David Pearson
    ------------------------------