MQ

MQ

Join this online group to communicate across IBM product users and experts by sharing advice and best practices with peers and staying up to date regarding product enhancements.

 View Only
  • 1.  RDQM DR/HA -- Performance Impact

    Posted Thu January 11, 2024 05:40 PM

    Hello,

    Please help me to find the remedy to slow-responsive queue manager running in RDQM DR/HA env.   The messages logs showing following error;

    Jan 11 09:51:59 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.200000
    Jan 11 09:52:29 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.380000
    Jan 11 09:52:59 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.310000
    Jan 11 09:53:13 txulmqprd2 su[72087]: (to mqm) root on pts/0
    Jan 11 09:53:29 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.410000
    Jan 11 09:53:59 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.180000
    Jan 11 09:54:05 txulmqprd2 kernel: drbd qm_mqp1_uv.dr _remote: [drbd_s_qm_mqp1_/10522] sending time expired, ko = 6
    Jan 11 09:54:29 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.250000
    Jan 11 09:54:59 txulmqprd2 pacemaker-controld[1817]: notice: High CPU load detected: 1.270000
    Jan 11 09:55:01 txulmqprd2 systemd[1]: Configuration file /usr/lib/forescout/daemon/SecureConnector.service is marked executable. Please remove executable permission bits. Proceeding anyway.
    Jan 11 09:55:02 txulmqprd2 kernel: drbd qm_mqp1_uv.dr _remote: [drbd_s_qm_mqp1_/10522] sending time expired, ko = 6
    @@@      

    Its impacting the application which connects to queuemanager a bigtime.  IOwait time is too high.  I will be greatful if anybody can advise me.

    Thank you, 

    Rajesh



    ------------------------------
    RAJESH VERMA
    ------------------------------


  • 2.  RE: RDQM DR/HA -- Performance Impact
    Best Answer

    Posted Thu January 11, 2024 05:55 PM

    You should open a case with IBM. Looks to be storage write issue. 



    ------------------------------
    om prakash
    ------------------------------



  • 3.  RE: RDQM DR/HA -- Performance Impact

    Posted Fri January 12, 2024 05:06 AM

    You have to be very careful with wht you red into IOWAIT in an MQ environment. What follows is mostly generic MQ advice, rather than RDQM specific advice.

    In most MQ environments nearly all of the forced IO should be to the MQ recovery log. The way the recovery log works is that all the active hConn's essentially append to the log buffer. Each time some hConn required their IO to be guaranteed as much as can be efficiently written from the log buffer will be written in a single forced write. When that write completes the logger will check to see if any other hConn has requested further IO to be forced and if so will immediately schedule another write (again the biggest write that can be efficiently scheduled based upon what data other tasks have appended to the log buffer). The overall effect of this is a batching effect where a small number of large writes are issued, rather than a high number of small writes. The algorithm works well with a wide variety of IO latencies, as might be expected given MQ's long history and therefore exposure to different IO technologies.

    In an HA/DR environment there tends to be more IO latency (as the IO has to be replicated to a remote node) and thus the tendancy is towards a smaller number of larger writes (assuming sufficient concurrency in the application workload to keep appending to the log buffer). In such a situation very high IOWAIT times would be expected.

    Have you run amqsrua to look at the LOG statistics ? in particular the write sizes and the IO latency.

    Regarding the high load average, have you looked at the high level MQI statistics to compare the number of MQI calls of different types ? If you compare the number of successful MQPUT's with the total number of MQI calls in any interval you'll get some idea as to the efficiency of your applications. For example an application that does MQCONN;MQOPEN(request);MQOPEN(reply);MQPUT(request); MQGET(reply); MQCLOSE(request);MQCLOSE(reply);MQDISC will use MUCH more CPU time than one which does

           MQCONN;MQOPEN; MQOPEN

          while(X)

            MQPUT

            MQGET

        end-while

       MQCLOSE

       MQCLOSE

       MQDISC

    Looking at high level MQI stats would be a good first step in lookin at unexpectedly high CPU usage.



    ------------------------------
    Andrew Hickson
    ------------------------------



  • 4.  RE: RDQM DR/HA -- Performance Impact

    Posted Fri January 12, 2024 03:34 PM

    Thank you very much Andrew, 

    Problem determination is in progress, seems something was changed in the Network between the HQ and DR site, which increased the iowait and taking the CPU resources.   I have stopped the replication between the HQ and DR site for now which helped to make the business process normal.   I have opened the case with IBM before posting the question here as also advised by Om Prakash, thank  you Om Prakash.

    I also want to know if I can change the configuration to replicate the data between  two sites by using one of the passive node in HQ so less overhead on the active node in RDQM HA at HQ... just thinking to avoid similar issue in future.

    Thank you,
    Regards,

    Rajesh



    ------------------------------
    RAJESH VERMA
    ------------------------------



  • 5.  RE: RDQM DR/HA -- Performance Impact

    Posted Fri January 12, 2024 01:43 PM

    Hi,

    Looking at the frequency of high CPU messages recorded in the logs, it would appear that the system load is high. So first point of consideration would be to review the current running processes then identify the processes that are consuming high CPU and memory. It is possible the processes consuming high CPU could be MQ or non-MQ process. Depending on the process the next steps can be decided as to understand why those processes are consuming high CPU. Also you can ensure if there any delays  with read/write to file system.

    Regards,

    Girish



    ------------------------------
    Girish D V
    ------------------------------



  • 6.  RE: RDQM DR/HA -- Performance Impact

    Posted Fri January 12, 2024 03:26 PM

    Thank you Girish.

    I have stopped one of the main non-MQ proess which seems to coming on top in the process list.  But the incident is impacting the production so I had stopped the replication to DR and which gave some relief in high iowait for now.  While work with network team to find the network congestion issue, which is what taking the whole CPU.

    Thank you,
    Regards,



    ------------------------------
    RAJESH VERMA
    ------------------------------