Informix

 View Only
Expand all | Collapse all

HDR server - heavy CPU load

  • 1.  HDR server - heavy CPU load

    Posted Thu March 17, 2022 04:05 AM

    Hello

     

    In a dev environment we have two ids servers in primary/hdr configuration.

    This environment has no load at all and I just noticed by chance that the HDR server uses a lot of resources.

     

    The IDS version is 14.10.FC4W1.


    I first stop and run again informix, but it does not help.
    What is the next step ?

    Thanks


    ------------------------------
    Samuel
    ------------------------------

    #Informix


  • 2.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Thu March 17, 2022 08:45 AM
    Next step, use onstat -g glo to find out which VPs are pids 27642, 27680, 27679, & 27640

    That will tell you what the server is trying to do. Another thing you can do is to run onstat -g act and onstat -g top thread cpu to see what threads are running for the same reason.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 3.  RE: HDR server - heavy CPU load

    Posted Thu March 17, 2022 09:36 AM
    Hi,

    I first tried with this
    https://www.ibm.com/support/pages/how-isolate-high-cpu-usage-informix

    If i do a
    "You can do onstat -u to get session ids"
    I get more that 40 rows... so it s not clear for me on which session id i have to
    onstat -g ses <sessionid>

    onstat -g act give this kind of result
    IBM Informix Dynamic Server Version 14.10.FC4W1 -- Read-Only (Sec) -- Up 01:04:08 -- 1892024 Kbytes

    Running threads:
    tid tcb rstcb prty status vp-class name
    8 b2883050 0 1 running 16shm* sm_poll
    9 b29956b8 0 1 running 17shm* sm_poll
    10 b29b28b0 0 1 running 18shm* sm_poll
    11 b29d9aa8 0 1 running 19shm* sm_poll
    12 b2a00ca0 0 1 running 20shm* sm_poll
    13 b2a33028 0 1 running 21shm* sm_poll
    14 b2a5a0d0 0 1 running 22shm* sm_poll
    15 b2a812c8 0 1 running 23shm* sm_poll
    16 b2aa74c0 0 1 running 24shm* sm_poll
    17 b2ac46b8 0 1 running 25shm* sm_poll
    18 b2aeb8b0 0 1 running 26shm* sm_poll
    19 b2b12aa8 0 1 running 27shm* sm_poll
    20 b2b39ca0 0 1 running 28soc* soctcppoll

    And i don't see the same thread repeatedly, so i cannot
    onstat -u | grep value_of_rstcb
    and
    onstat -g ses session_id


    After a server reboot
    the pids are with cpu 100% in top are 7934 > 7940

    Those pids in onstat -g glo
    8 7934 cpu 862.82 0.24 863.06 863.06 100%
    9 7935 cpu 610.99 0.16 611.15 611.15 100%
    10 7936 cpu 835.24 0.15 835.39 835.39 100%
    11 7937 cpu 3471.19 0.36 3471.55 3471.55 100%
    12 7938 cpu 3055.12 0.18 3055.30 3055.30 100%
    13 7939 cpu 2611.39 0.27 2611.66 2611.66 100%
    14 7940 cpu 2518.43 0.32 2518.75 2518.75 100%

    vps 8 to 14
    Now how this will tell me what the server is trying to do ?

    Thanks

    ------------------------------
    Samuel To
    ------------------------------



  • 4.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Thu March 17, 2022 10:33 AM
    So, there are a large number of sm_poll threads and they are running in VPs 16 through 28. If those are the VPs that are spinning CPU cycles (ie the numbers identified in the top or the "top" listing) then what the server is doing is looking for shared memory connections by constantly polling the shared memory segments. I can't tell that because you didn't post the entire onstat -g glo output, only those CPU VPs 8 through 14, so ...

    If it is the sm_poll threads then I would look at the NETTYPE onipcshm setting. It looks like you are running those threads in NET VPs which have no choice but to poll constantly since you cannot block on a shared memory read. Better to run those listeners in the CPU VPs which have other work to do and sleep from time-to-time.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 5.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 03:36 AM
    Hello,

    The network conf params are :
    (copied from the IDS 12 to 14)

    NETTYPE ipcshm,12,128,CPU
    NETTYPE soctcp,1,50,NET
    LISTEN_TIMEOUT 60
    MAX_INCOMPLETE_CONNECTIONS 1024
    FASTPOLL 1
    NUMFDSERVERS 4
    NS_CACHE host=900,service=900,user=900,group=900,sqlhosts=900
    NET_IO_TIMEOUT_ALARM 4
    DRDA_COMMBUFFSIZE 32

    And the complete "onstat -g glo"

    IBM Informix Dynamic Server Version 14.10.FC4W1 -- Read-Only (Sec) -- Up 3 days 18:56:11 -- 1900216 Kbytes

    MT global info:
    sessions threads vps lngspins time
    2 105 32 81 327371

    sched calls thread switches yield 0 yield n yield forever
    total: 1199692070 15655953 1184163115 9123296 1303341
    per sec: 85 0 85 0 0

    Virtual processor summary:
    class vps usercpu syscpu total
    cpu 8 1017276.33 128.97 1017405.30
    aio 5 3.07 13.39 16.46
    shm 12 6.95 23.78 30.73
    lio 1 0.54 1.94 2.48
    pio 1 0.58 1.92 2.50
    adm 1 6.08 12.15 18.23
    soc 1 25.43 85.24 110.67
    msc 1 0.00 0.00 0.00
    jvp 1 809.72 2156.09 2965.81
    fifo 1 0.52 1.95 2.47
    total 32 1018129.22 2425.43 1020554.65

    Individual virtual processors:
    vp pid class usercpu syscpu total Thread Eff
    1 7907 cpu 12775.87 16.52 12792.39 12792.39 100%
    2 7914 adm 6.08 12.15 18.23 0.00 0%
    3 7915 lio 0.54 1.94 2.48 2.48 100%
    4 7919 pio 0.58 1.92 2.50 2.50 100%
    5 7922 aio 0.93 5.35 6.28 6.28 100%
    6 7926 msc 0.00 0.00 0.00 0.04 0%
    7 7930 fifo 0.52 1.95 2.47 2.47 100%
    8 7934 cpu 73882.85 15.23 73898.08 73898.08 100%
    9 7935 cpu 95050.77 14.86 95065.63 95065.63 100%
    10 7936 cpu 186624.47 18.33 186642.80 186642.80 100%
    11 7937 cpu 154446.57 15.21 154461.78 154461.78 100%
    12 7938 cpu 131182.14 14.73 131196.87 131196.87 100%
    13 7939 cpu 184970.47 17.21 184987.68 184987.68 100%
    14 7940 cpu 178343.19 16.88 178360.07 178360.07 100%
    15 7943 jvp 809.72 2156.09 2965.81 0.00 0%
    16 7944 shm 0.57 1.98 2.55 NA NA
    17 7946 shm 0.58 1.94 2.52 NA NA
    18 7947 shm 0.57 1.96 2.53 NA NA
    19 7948 shm 0.55 2.05 2.60 NA NA
    20 7949 shm 0.58 2.01 2.59 NA NA
    21 7950 shm 0.57 2.00 2.57 NA NA
    22 7951 shm 0.59 1.99 2.58 NA NA
    23 7952 shm 0.61 1.96 2.57 NA NA
    24 7955 shm 0.59 2.03 2.62 NA NA
    25 7956 shm 0.55 2.01 2.56 NA NA
    26 7957 shm 0.61 1.90 2.51 NA NA
    27 7958 shm 0.58 1.95 2.53 NA NA
    28 7960 soc 25.43 85.24 110.67 NA NA
    29 7964 aio 0.55 2.05 2.60 2.60 100%
    30 7967 aio 0.56 1.94 2.50 2.50 100%
    31 7968 aio 0.49 2.04 2.53 2.53 100%
    32 7971 aio 0.54 2.01 2.55 2.55 100%
                  tot 1018129.22 2425.43 1020554.65

    The network configuration was done by someone who understood what he was doing (unlike me),
    and I guess everything was working fine before the switch to ID 14


    Thanks.


    ------------------------------
    Samuel To
    ------------------------------



  • 6.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Mon March 21, 2022 07:09 AM
    Samuel:

    You are correct. The network settings look fine. Now we have a full onstat -g glo output, however, the server was restarted 3 days ago, so the process ids of the oninits no longer match the "top" output from before the restart. So, now can you post the top output showing the spinning oninit processes as they stand now?

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 7.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 07:21 AM
    Hello,

    Of course, it's about like this...

    top - 13:13:47 up 3 days, 23:19, 3 users, load average: 3.14, 3.24, 3.20
    Tasks: 314 total, 5 running, 309 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 40.9 us, 0.1 sy, 0.0 ni, 59.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 16253608 total, 13611648 free, 622844 used, 2019116 buff/cache
    KiB Swap: 8257532 total, 8257532 free, 0 used. 14438816 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    7934 informix 10 -10 2090896 258884 253944 R 100.0 1.6 1396:59 oninit
    7937 informix 10 -10 2090896 204032 199100 R 100.0 1.3 2639:50 oninit
    7938 informix 10 -10 2090896 199036 194104 R 100.0 1.2 2285:07 oninit
    7935 informix 10 -10 2090896 218004 213068 R 23.6 1.3 1664:20 oninit
    7936 informix 10 -10 2090896 206920 201984 S 1.0 1.3 3202:20 oninit
    7939 informix 10 -10 2090896 198640 193704 S 1.0 1.2 3209:14 oninit
    7943 informix 10 -10 2088708 5624 708 S 1.0 0.0 51:45.97 oninit
    7907 informix 10 -10 2093708 291332 286372 S 0.7 1.8 248:36.29 oninit
    7940 informix 10 -10 2090896 189896 184964 S 0.7 1.2 3088:51 oninit
    4249 root 20 0 1548360 87820 5960 S 0.3 0.5 27:30.38 ragent
    32751 dbown 20 0 162336 2504 1564 S 0.3 0.0 16:13.90 top
    1 root 20 0 194704 7684 4216 S 0.0 0.0 24:59.88 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.48 kthreadd
    4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    6 root 20 0 0 0 0 S 0.0 0.0 0:25.44 ksoftirqd/0
    7 root rt 0 0 0 0 S 0.0 0.0 0:02.57 migration/0
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
    9 root 20 0 0 0 0 S 0.0 0.0 1:57.06 rcu_sched
    10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain
    11 root rt 0 0 0 0 S 0.0 0.0 0:00.73 watchdog/0
    12 root rt 0 0 0 0 S 0.0 0.0 0:00.64 watchdog/1
    13 root rt 0 0 0 0 S 0.0 0.0 0:01.86 migration/1

    I want to compare the whole onconfig (IDS12 and 14), maybe a parameter was not copied correctly.
    If I find something I'll let you know..

    Thanks

    ------------------------------
    Samuel To
    ------------------------------



  • 8.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Mon March 21, 2022 09:05 AM
    Samuel:

    OK, two things:
    • The spinning oninit processes are not the SHM VPs but two of the CPU VPs probably those running recovery threads. I don't know why they are spinning at 100% of CPU capacity, but I would point out that the recovery code which supports fast recovery, archive restores, and secondary server recovery processing was rewritten in v14.10 resulting in 5X faster recovery and reduced secondary latency, especially for RSS secondaries.
    • You have only 8 CPU VPs but you configured 12 shared memory listeners in your NETTYPE. Although you configured them to run in the CPU VPs, there are not enough CPU VPs to run 12 listeners so the engine started up 12 SHM VPs instread. I suggest changing the NETTYPE from:
      NETTYPE ipcshm,12,128,CPU
       -- to --
      NETTYPE ipcshm,8,196,CPU
      That will give you at least as many connection slots as before but eliminate the need for the SHM VPs. That will reduce overall system overhead significantly. Whether it will eliminate the spinning CPU VPs, I don't know, but it's a start.


    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 9.  RE: HDR server - heavy CPU load

    Posted Fri March 18, 2022 07:00 AM
    Hi Samuel,

    I don't see any 14.10 replication threads in your "onstat -g act" output such as "bld_logrecs", "mreplay" or "wreplay_X". Is this because the server does not get many updates?

    But I would second what Art is saying: observing our own systems the sm_poll threads should not be active so much. How many of these do you have and are they running on CPU VPs? What is "NETTYPE ipcshm" set to in your onconfig?

    Ben.

    ------------------------------
    Benjamin Thompson
    ------------------------------



  • 10.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 03:43 AM
    Edited by System Fri January 20, 2023 04:42 PM
    Hello,

    Indeed, this server doesn't get a lot of updates...

    About the onconfig configuration I just posted it in response above


    Thanks


    ------------------------------
    Samuel To
    ------------------------------



  • 11.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 09:22 AM
    In that case what is OFF_RECVRY_THREADS set to? The manual needs updating as it recommends unsuitably high values:

    • If you have enough shared memory, set the number of threads to the number of tables or fragments that are frequently updated. Balance the number of threads with the amount of shared memory.
    • On a single-CPU computer, set the number of threads to 10 - 30 or 40. The cost of too many threads can outweigh the advantages of parallel operations.

    On 14.10, 30 threads is far too many for any system and will result in w_replayX threads causing high CPU usage and generally getting in each others' way. In 12.10 and previously this parameter controlled the number of xchg threads.

    Maybe try something like '7' on your system.

    Ben.

    ------------------------------
    Benjamin Thompson
    ------------------------------



  • 12.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Fri March 18, 2022 03:17 AM

    Hi Samuel,

    I see two approaches :

    at first play with the HDR onconfig parameters. For example set:

    DRINTERVAL 10
    HDR_TXN_SCOPE ASYNC
    DRTIMEOUT 15

    SMX_NUMPIPES 4
    SMX_PING_INTERVAL 10

    CLUSTER_TXN_SCOPE SERVER
    SEC_NONBLOCKING_CKPT 0

    if this is fine you can watch your session activities as the second point :

    onstat -g ses 0 > my_sessions      # check the running sessions ... repeat and check these long-running queries again

    onstat -g glo  ;sleep 10; onstat -g glo    # calculate the total cpu usage to check if the informix cpu usage is in relation to the system resources

    Yes, sounds like Informix consulting - it's our business - we can do this for you :-)

    Good Luck

    Henri

    ​​

    ------------------------------
    Henri Cujass
    leolo IT, CTO
    Germany
    IBM Champion 2021
    ------------------------------



  • 13.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 05:53 AM
    Hello,

    The params are
    DRINTERVAL 0
    HDR_TXN_SCOPE NEAR_SYNC
    DRTIMEOUT 30

    SMX_NUMPIPES 1
    SMX_PING_INTERVAL 10
    CLUSTER_TXN_SCOPE SERVER

    And SEC_NONBLOCKING_CKPT param is not define.
    SEC_NONBLOCKING_CKPT and other params are defined the same way on the HDR production server that acts fine.

    About the rest, I'm not able to understand the output.

    Thanks




    ------------------------------
    Samuel To
    ------------------------------



  • 14.  RE: HDR server - heavy CPU load

    Posted Fri March 18, 2022 04:23 AM
    when you find CPU used 100% which is running oninit process。

    you can do it to find which session or which SQL is using CPU。

    onstat -g ath|grep running|grep sqlsexec|awk '{print "onstat -u |grep "$3}'|sh|awk '{print "onstat -g ses "$2}'|sh

    you can find "Current Statement"   SQL.

    this command can be run in X86_64,and other OS you must  adjust

    ------------------------------
    ZhiWei Cui
    GBASE
    ------------------------------



  • 15.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Tue March 22, 2022 02:06 PM


    HI,

    If it is the CPU Vps then run

    onstat -g cpu

    and see which threads are taking up the time.

    then tie that back to a session (osntat -u) and onstat -g ses <session id> to see what that sessions is doing and which app process is driving that session.



    ------------------------------
    David Williams
    ------------------------------