MQ

MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  Increasing Throughput on a CLUSSDR Channel

    Posted Thu November 02, 2023 04:11 PM

    I have a scenario where I am seeing S.C.T.Q. backups between QM1 -> QM2. The messages are in a cluster and flowing over one instance of a CLUSSDR channel (e.g. CLUS.QM2). The QM2 only has one CLUSRCVR channel for the cluster, per standards for clustering. If I was to define another CLUSRCVR channel on QM2 (e.g. CLUS2.QM2), I would assume QM1 could then run 2 CLUSSDR channels (CLUS.QM2 and CLUS2.QM2) to QM2. However, I had some questions.

    1) Is this not a good idea to have two CLUSRCVR channels for the same cluster on a queue manager? I did some manual research and only found a concern about how it can hurt performance with channel switching for lower volume messages. But I am really more concerned about addressing high volume of messages and moving the messages through the S.C.T.Q. at a faster rate.

    2) Has anyone else used this approach and found it helpful?



    ------------------------------
    Tim Zielke
    ------------------------------


  • 2.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Fri November 03, 2023 03:45 AM
    An alternative is to make you existing channel do more work.

    Make sure your MQ batch size is large ( 50+) for small messages(under 50KB),
    monitor the achieved batch size with the specified batch size - they should be similar.

    The TCP buffer sizes make a huge difference. 
    https://www.ibm.com/docs/en/ibm-mq/9.2?topic=information-tcp-stanza-qmini-file says that MQ now specifies 0 by default which means take the OS default.

    On Linux you can use the ss command ( or the older netstat command).

    There are 4 types of tcp buffers
    SndBuffSize, RcvBuffSize, RcvSendBuffSize, and RcvRcvBuffSize.

    I do not have the good figures to hand, but I know the receive buffers should be over 64KB.

    I with ?= 64KB then you should get Dynamic right sizing, and tcp will automatically increase the receive buffer sizes to the optimum value.
    Ive seen receive buffer sizes of 256000 or so.

    I would monitor the socket and see how big the buffers are, and the round trip time etc, to see if there are any long delays

    Colin







  • 3.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Mon November 06, 2023 04:44 AM

    Tim,

    You mention S.C.T.Q, are you using the actual (singular) SYSTEM.CLUSTER.TRANSMIT.QUEUE or have you enabled multiple cluster transmit queues.

    If you are still using the singular S.C.T.Q then I'd probably start by removing this unnecessary serialization point. In addition to removing the performance constrains of the single queue, it also makes monitoring what's happening on this specific channel/queue a little easier. The throughput of a single channel should be pretty high, and if you enable pipelining even higher (something of the effect of multiple channels). What does sampling the channel substate during the issue suggest (at both the sending and receiving end)  might be the limiting factor ? Are the values for NETTIME within the expected boundaries ? Is the IO latency on the recovery log (at each end of the channel) reasonable (assuming persistent messages), ...

    Similarly, there seems to be an assumption that the issue is at the sending end of the channel, while it's equally likely that the issue is with the putting end of the channel not being able to deliver the messages fast enough. Are all of most of these messages going to the same destination queue, or are they spread about across a set of queues. Have you check for any performance issues on the queues to which the messages are being delivered.

    The absence of any actual numbers and details of the environment in which the issue is being observed makes it difficult to offer properly targetted advice.

    Regards

    Andy.



    ------------------------------
    Andrew Hickson
    ------------------------------



  • 4.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Mon November 06, 2023 09:38 AM

    Hi, Tim.  One of my clients had a misbehaving NIC card at one end or the other that inflicted random and unpredictable slowdowns.  The NIC card never actually failed completely, but we could watch XMITQTIME extend.  A misbehaving switch or router or cable in the middle could produce similar results. 



    ------------------------------
    bruce2359
    Consultant/trainer
    CTTECH - Computer & Telecommunications Technology
    West Coast, almost
    000-000-0000
    ------------------------------



  • 5.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Tue November 07, 2023 10:08 AM

    Are QM1 and QM2 on similarly provisioned hardware platforms?  If QM1 is on z hardware, for example, and QM2 is on lesser provisioned midrange hardware, QM2 (cluster)receiver channel is likely overwhelmed.



    ------------------------------
    bruce2359
    Consultant/trainer
    CTTECH - Computer & Telecommunications Technology
    West Coast, almost
    000-000-0000
    ------------------------------



  • 6.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Tue November 07, 2023 12:42 PM

    Thank you all for the helpful tips! The SCTQ build up has not happened for a while, but this will help in better diagnosing the issue if/when it happens again. 



    ------------------------------
    Tim Zielke
    ------------------------------



  • 7.  RE: Increasing Throughput on a CLUSSDR Channel

    Posted Wed November 08, 2023 08:13 AM

    Hi Tim,

    We saw the same thing happen on a non clustered channel. Whenever there was a high volume the transmit queue would show some backlog (and it happened only with high volume).

    We set up a wire-shark capture on the MQ server and realized that we were dealing with dropped packets. It took a little while to narrow down the dropped packets to a particular router/switch. Once that was removed from the path, no more backlogs... Hope it helps.



    ------------------------------
    Francois Brandelik
    ------------------------------