Informix

 View Only
Expand all | Collapse all

HDR server - heavy CPU load

  • 1.  HDR server - heavy CPU load

    Posted Thu March 17, 2022 04:05 AM

    Hello

     

    In a dev environment we have two ids servers in primary/hdr configuration.

    This environment has no load at all and I just noticed by chance that the HDR server uses a lot of resources.

     

    The IDS version is 14.10.FC4W1.


    I first stop and run again informix, but it does not help.
    What is the next step ?

    Thanks


    ------------------------------
    Samuel
    ------------------------------


  • 2.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Thu March 17, 2022 08:45 AM
    Next step, use onstat -g glo to find out which VPs are pids 27642, 27680, 27679, & 27640

    That will tell you what the server is trying to do. Another thing you can do is to run onstat -g act and onstat -g top thread cpu to see what threads are running for the same reason.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 3.  RE: HDR server - heavy CPU load

    Posted Thu March 17, 2022 09:36 AM
    Hi,

    I first tried with this
    https://www.ibm.com/support/pages/how-isolate-high-cpu-usage-informix

    If i do a
    "You can do onstat -u to get session ids"
    I get more that 40 rows... so it s not clear for me on which session id i have to
    onstat -g ses <sessionid>

    onstat -g act give this kind of result
    IBM Informix Dynamic Server Version 14.10.FC4W1 -- Read-Only (Sec) -- Up 01:04:08 -- 1892024 Kbytes

    Running threads:
    tid tcb rstcb prty status vp-class name
    8 b2883050 0 1 running 16shm* sm_poll
    9 b29956b8 0 1 running 17shm* sm_poll
    10 b29b28b0 0 1 running 18shm* sm_poll
    11 b29d9aa8 0 1 running 19shm* sm_poll
    12 b2a00ca0 0 1 running 20shm* sm_poll
    13 b2a33028 0 1 running 21shm* sm_poll
    14 b2a5a0d0 0 1 running 22shm* sm_poll
    15 b2a812c8 0 1 running 23shm* sm_poll
    16 b2aa74c0 0 1 running 24shm* sm_poll
    17 b2ac46b8 0 1 running 25shm* sm_poll
    18 b2aeb8b0 0 1 running 26shm* sm_poll
    19 b2b12aa8 0 1 running 27shm* sm_poll
    20 b2b39ca0 0 1 running 28soc* soctcppoll

    And i don't see the same thread repeatedly, so i cannot
    onstat -u | grep value_of_rstcb
    and
    onstat -g ses session_id


    After a server reboot
    the pids are with cpu 100% in top are 7934 > 7940

    Those pids in onstat -g glo
    8 7934 cpu 862.82 0.24 863.06 863.06 100%
    9 7935 cpu 610.99 0.16 611.15 611.15 100%
    10 7936 cpu 835.24 0.15 835.39 835.39 100%
    11 7937 cpu 3471.19 0.36 3471.55 3471.55 100%
    12 7938 cpu 3055.12 0.18 3055.30 3055.30 100%
    13 7939 cpu 2611.39 0.27 2611.66 2611.66 100%
    14 7940 cpu 2518.43 0.32 2518.75 2518.75 100%

    vps 8 to 14
    Now how this will tell me what the server is trying to do ?

    Thanks

    ------------------------------
    Samuel To
    ------------------------------



  • 4.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Thu March 17, 2022 10:33 AM
    So, there are a large number of sm_poll threads and they are running in VPs 16 through 28. If those are the VPs that are spinning CPU cycles (ie the numbers identified in the top or the "top" listing) then what the server is doing is looking for shared memory connections by constantly polling the shared memory segments. I can't tell that because you didn't post the entire onstat -g glo output, only those CPU VPs 8 through 14, so ...

    If it is the sm_poll threads then I would look at the NETTYPE onipcshm setting. It looks like you are running those threads in NET VPs which have no choice but to poll constantly since you cannot block on a shared memory read. Better to run those listeners in the CPU VPs which have other work to do and sleep from time-to-time.

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 5.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 03:36 AM
    Hello,

    The network conf params are :
    (copied from the IDS 12 to 14)

    NETTYPE ipcshm,12,128,CPU
    NETTYPE soctcp,1,50,NET
    LISTEN_TIMEOUT 60
    MAX_INCOMPLETE_CONNECTIONS 1024
    FASTPOLL 1
    NUMFDSERVERS 4
    NS_CACHE host=900,service=900,user=900,group=900,sqlhosts=900
    NET_IO_TIMEOUT_ALARM 4
    DRDA_COMMBUFFSIZE 32

    And the complete "onstat -g glo"

    IBM Informix Dynamic Server Version 14.10.FC4W1 -- Read-Only (Sec) -- Up 3 days 18:56:11 -- 1900216 Kbytes

    MT global info:
    sessions threads vps lngspins time
    2 105 32 81 327371

    sched calls thread switches yield 0 yield n yield forever
    total: 1199692070 15655953 1184163115 9123296 1303341
    per sec: 85 0 85 0 0

    Virtual processor summary:
    class vps usercpu syscpu total
    cpu 8 1017276.33 128.97 1017405.30
    aio 5 3.07 13.39 16.46
    shm 12 6.95 23.78 30.73
    lio 1 0.54 1.94 2.48
    pio 1 0.58 1.92 2.50
    adm 1 6.08 12.15 18.23
    soc 1 25.43 85.24 110.67
    msc 1 0.00 0.00 0.00
    jvp 1 809.72 2156.09 2965.81
    fifo 1 0.52 1.95 2.47
    total 32 1018129.22 2425.43 1020554.65

    Individual virtual processors:
    vp pid class usercpu syscpu total Thread Eff
    1 7907 cpu 12775.87 16.52 12792.39 12792.39 100%
    2 7914 adm 6.08 12.15 18.23 0.00 0%
    3 7915 lio 0.54 1.94 2.48 2.48 100%
    4 7919 pio 0.58 1.92 2.50 2.50 100%
    5 7922 aio 0.93 5.35 6.28 6.28 100%
    6 7926 msc 0.00 0.00 0.00 0.04 0%
    7 7930 fifo 0.52 1.95 2.47 2.47 100%
    8 7934 cpu 73882.85 15.23 73898.08 73898.08 100%
    9 7935 cpu 95050.77 14.86 95065.63 95065.63 100%
    10 7936 cpu 186624.47 18.33 186642.80 186642.80 100%
    11 7937 cpu 154446.57 15.21 154461.78 154461.78 100%
    12 7938 cpu 131182.14 14.73 131196.87 131196.87 100%
    13 7939 cpu 184970.47 17.21 184987.68 184987.68 100%
    14 7940 cpu 178343.19 16.88 178360.07 178360.07 100%
    15 7943 jvp 809.72 2156.09 2965.81 0.00 0%
    16 7944 shm 0.57 1.98 2.55 NA NA
    17 7946 shm 0.58 1.94 2.52 NA NA
    18 7947 shm 0.57 1.96 2.53 NA NA
    19 7948 shm 0.55 2.05 2.60 NA NA
    20 7949 shm 0.58 2.01 2.59 NA NA
    21 7950 shm 0.57 2.00 2.57 NA NA
    22 7951 shm 0.59 1.99 2.58 NA NA
    23 7952 shm 0.61 1.96 2.57 NA NA
    24 7955 shm 0.59 2.03 2.62 NA NA
    25 7956 shm 0.55 2.01 2.56 NA NA
    26 7957 shm 0.61 1.90 2.51 NA NA
    27 7958 shm 0.58 1.95 2.53 NA NA
    28 7960 soc 25.43 85.24 110.67 NA NA
    29 7964 aio 0.55 2.05 2.60 2.60 100%
    30 7967 aio 0.56 1.94 2.50 2.50 100%
    31 7968 aio 0.49 2.04 2.53 2.53 100%
    32 7971 aio 0.54 2.01 2.55 2.55 100%
                  tot 1018129.22 2425.43 1020554.65

    The network configuration was done by someone who understood what he was doing (unlike me),
    and I guess everything was working fine before the switch to ID 14


    Thanks.


    ------------------------------
    Samuel To
    ------------------------------



  • 6.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Mon March 21, 2022 07:09 AM
    Samuel:

    You are correct. The network settings look fine. Now we have a full onstat -g glo output, however, the server was restarted 3 days ago, so the process ids of the oninits no longer match the "top" output from before the restart. So, now can you post the top output showing the spinning oninit processes as they stand now?

    Art

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 7.  RE: HDR server - heavy CPU load

    Posted Mon March 21, 2022 07:21 AM
    Hello,

    Of course, it's about like this...

    top - 13:13:47 up 3 days, 23:19, 3 users, load average: 3.14, 3.24, 3.20
    Tasks: 314 total, 5 running, 309 sleeping, 0 stopped, 0 zombie
    %Cpu(s): 40.9 us, 0.1 sy, 0.0 ni, 59.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
    KiB Mem : 16253608 total, 13611648 free, 622844 used, 2019116 buff/cache
    KiB Swap: 8257532 total, 8257532 free, 0 used. 14438816 avail Mem

    PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
    7934 informix 10 -10 2090896 258884 253944 R 100.0 1.6 1396:59 oninit
    7937 informix 10 -10 2090896 204032 199100 R 100.0 1.3 2639:50 oninit
    7938 informix 10 -10 2090896 199036 194104 R 100.0 1.2 2285:07 oninit
    7935 informix 10 -10 2090896 218004 213068 R 23.6 1.3 1664:20 oninit
    7936 informix 10 -10 2090896 206920 201984 S 1.0 1.3 3202:20 oninit
    7939 informix 10 -10 2090896 198640 193704 S 1.0 1.2 3209:14 oninit
    7943 informix 10 -10 2088708 5624 708 S 1.0 0.0 51:45.97 oninit
    7907 informix 10 -10 2093708 291332 286372 S 0.7 1.8 248:36.29 oninit
    7940 informix 10 -10 2090896 189896 184964 S 0.7 1.2 3088:51 oninit
    4249 root 20 0 1548360 87820 5960 S 0.3 0.5 27:30.38 ragent
    32751 dbown 20 0 162336 2504 1564 S 0.3 0.0 16:13.90 top
    1 root 20 0 194704 7684 4216 S 0.0 0.0 24:59.88 systemd
    2 root 20 0 0 0 0 S 0.0 0.0 0:00.48 kthreadd
    4 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/0:0H
    6 root 20 0 0 0 0 S 0.0 0.0 0:25.44 ksoftirqd/0
    7 root rt 0 0 0 0 S 0.0 0.0 0:02.57 migration/0
    8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
    9 root 20 0 0 0 0 S 0.0 0.0 1:57.06 rcu_sched
    10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 lru-add-drain
    11 root rt 0 0 0 0 S 0.0 0.0 0:00.73 watchdog/0
    12 root rt 0 0 0 0 S 0.0 0.0 0:00.64 watchdog/1
    13 root rt 0 0 0 0 S 0.0 0.0 0:01.86 migration/1

    I want to compare the whole onconfig (IDS12 and 14), maybe a parameter was not copied correctly.
    If I find something I'll let you know..

    Thanks

    ------------------------------
    Samuel To
    ------------------------------



  • 8.  RE: HDR server - heavy CPU load

    IBM Champion
    Posted Mon March 21, 2022 09:05 AM
    Samuel:

    OK, two things:
    • The spinning oninit processes are not the SHM VPs but two of the CPU VPs probably those running recovery threads. I don't know why they are spinning at 100% of CPU capacity, but I would point out that the recovery code which supports fast recovery, archive restores, and secondary server recovery processing was rewritten in v14.10 resulting in 5X faster recovery and reduced secondary latency, especially for RSS secondaries.
    • You have only 8 CPU VPs but you configured 12 shared memory listeners in your NETTYPE. Although you configured them to run in the CPU VPs, there are not enough CPU VPs to run 12 listeners so the engine started up 12 SHM VPs instread. I suggest changing the NETTYPE from:
      NETTYPE ipcshm,12,128,CPU
       -- to --
      NETTYPE ipcshm,8,196,CPU
      That will give you at least as many connection slots as before but eliminate the need for the SHM VPs. That will reduce overall system overhead significantly. Whether it will eliminate the spinning CPU VPs, I don't know, but it's a start.


    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------