Informix

 View Only
  • 1.  kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    IBM Champion
    Posted Tue March 08, 2022 04:15 AM
    Edited by Hrvoje Zokovic Tue March 08, 2022 04:16 AM
    Anyone has experience/solution with this:
    Feb 14 14:28:46 dbpri kernel: INFO: task oninit:1720 blocked for more than 120 seconds.
    Feb 14 14:28:46 dbpri kernel:   Not tainted 4.18.0-348.12.2.el8_5.x86_64 #1
    Feb 14 14:28:46 dbpri kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
    Feb 14 14:28:46 dbpri kernel: task:oninit     state:D stack:  0 pid: 1720 ppid: 1714 flags:0x00000084
    Feb 14 14:28:46 dbpri kernel: Call Trace:
    Feb 14 14:28:46 dbpri kernel: __schedule+0x2bd/0x760
    Feb 14 14:28:46 dbpri kernel: ? blk_flush_plug_list+0xd7/0x100
    Feb 14 14:28:46 dbpri kernel: schedule+0x37/0xa0
    Feb 14 14:28:46 dbpri kernel: ext4_unwritten_wait+0x83/0xb0 [ext4]
    Feb 14 14:28:46 dbpri kernel: ? finish_wait+0x80/0x80
    Feb 14 14:28:46 dbpri kernel: ext4_file_write_iter+0x392/0x3e0 [ext4]
    Feb 14 14:28:46 dbpri kernel: aio_write+0xf6/0x1c0
    Feb 14 14:28:46 dbpri kernel: ? blk_mq_sched_insert_requests+0x71/0xf0
    Feb 14 14:28:46 dbpri kernel: ? io_submit_one+0x7e/0x3c0
    Feb 14 14:28:46 dbpri kernel: ? kmem_cache_alloc+0x12e/0x270
    Feb 14 14:28:46 dbpri kernel: io_submit_one+0x131/0x3c0
    Feb 14 14:28:46 dbpri kernel: __x64_sys_io_submit+0xa2/0x180
    Feb 14 14:28:46 dbpri kernel: ? syscall_trace_enter+0x1d3/0x2c0
    Feb 14 14:28:46 dbpri kernel: do_syscall_64+0x5b/0x1a0
    Feb 14 14:28:46 dbpri kernel: entry_SYSCALL_64_after_hwframe+0x65/0xca
    Feb 14 14:28:46 dbpri kernel: RIP: 0033:0x7fbb9d46652d
    Feb 14 14:28:46 dbpri kernel: Code: Unable to access opcode bytes at RIP 0x7fbb9d466503.
    Feb 14 14:28:46 dbpri kernel: RSP: 002b:00000000500d7da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000d1
    Feb 14 14:28:46 dbpri kernel: RAX: ffffffffffffffda RBX: 00007fbb9ecf7280 RCX: 00007fbb9d46652d
    Feb 14 14:28:46 dbpri kernel: RDX: 00000000500d7de8 RSI: 0000000000000001 RDI: 00007fbb98207000
    Feb 14 14:28:46 dbpri kernel: RBP: 00007fbb98207000 R08: 0000000002229800 R09: 00000000538288d0
    Feb 14 14:28:46 dbpri kernel: R10: ffffffffffffffe0 R11: 0000000000000246 R12: 0000000000000001
    Feb 14 14:28:46 dbpri kernel: R13: 0000000000000000 R14: 00000000500d7de8 R15: 0000000053828850

    It is Red Hat Enterprise Linux 8.5 (Ootpa)
    Linux dbpri 4.18.0-348.12.2.el8_5.x86_64 #1 SMP Mon Jan 17 07:06:06 EST 2022 x86_64 x86_64 x86_64 GNU/Linux

    I opened support case but no answer yet
    Ofc I do not want to disable this message :)

    Hrvoje

    ------------------------------
    Hrvoje Zokovic
    ------------------------------


  • 2.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    Posted Fri October 28, 2022 09:03 AM
    Hello Hrvoje Zokovic,

    I am also facing the same issue. Our Informix instance was in hung state, twice in last 24 Hrs and same message in OS log file.

    Current Informix version is IBM Informix Dynamic Server Version 14.10.FC8WE and 
    Os Version
    Linux version 4.18.0-372.26.1.el8_6.x86_64 (mockbuild@x86-vm-08.build.eng.bos.redhat.com) (gcc version 8.5.0 20210514 (Red Hat 8.5.0-10) (GCC)) #1 SMP Sat Aug 27 02:44:20 EDT 2022


    Did you find any solution of this error?


    regards
    Tarun Batra

    ------------------------------
    Tarun Batra
    ------------------------------



  • 3.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    InnerCircle
    Posted Fri October 28, 2022 09:14 AM
    I'd say Informix submitting a kaio request to the OS, and this submission library call hung in the kernel - probably a problem on OS level.
    Would this ever recover by itself, or is it hung forever and requires an Informix or even OS reboot?

    ------------------------------
    Andreas Legner
    ------------------------------



  • 4.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    Posted Fri October 28, 2022 09:55 AM
    It was hung forever and we rebooted the machine

    Tarun Batra

    ------------------------------
    Tarun Batra
    ------------------------------



  • 5.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    IBM Champion
    Posted Fri October 28, 2022 09:32 AM
    Hi
    we turned off DIRECT_IO and since then no problem.
    This was support conclusion:

    I might be a good idea to try with some other DIRECT_IO setting.

    Especially when you see issue more often.

    But also would be interesting to get an information from the OS support if they can find out what is blocking oninit.

    From an Informix side we can check stacks of the cpu VP with the running KAIO. With pstack/procstack.

    But very likely only confirming that is hanging in an IO request.


    HTH
    Hrvoje

    ------------------------------
    Hrvoje Zokovic
    ------------------------------



  • 6.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    IBM Champion
    Posted Fri October 28, 2022 09:34 AM
    Hmm, maybe the company that owns RedHat should look into this problem with KAIO? Hmm, who is that?

    Oh, wait, it's IBM!

    ------------------------------
    Art S. Kagel, President and Principal Consultant
    ASK Database Management Corp.
    www.askdbmgt.com
    ------------------------------



  • 7.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    IBM Champion
    Posted Fri October 28, 2022 09:38 AM
    Yes Art, at that "get an information from the OS support" moment I surrendered :)
    Hrvoje


    ------------------------------
    Hrvoje Zokovic
    ------------------------------



  • 8.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    Posted 25 days ago

    Hi,

    io_submit can block for many reasons - https://stackoverflow.com/questions/34572559/asynchronous-io-io-submit-latency-in-ubuntu-linux

    In this case it appears to be an unaligned write e.g. writing to the 2nd 2K Informix page on a 4K ext4 block.

    https://elixir.bootlin.com/linux/v4.18/source/fs/ext4/file.c#L247

    :* Unaligned direct AIO must be serialized among each other as zeroing
    * of partial blocks of two competing unaligned AIOs can result in data
    * corruption."

    https://elixir.bootlin.com/linux/v4.18/source/fs/ext4/file.c#L106  ext4_unwritten_wait

    This the calls schedule which then invokes the block unplugging mechanism
    https://lwn.net/Articles/438256/

    i.e. have the process  action all pending i/o's!

    Looks like the device took 120s to action all the pending i/os.

    From the comments it is unclear if it needs to submit or action the i/os!

    You could have been hitting the limits in https://stackoverflow.com/questions/34572559/asynchronous-io-io-submit-latency-in-ubuntu-linux item 4

    /sys/block/[disk]/queue/nr_request
    /sys/block/[disk]/device/queue_depth

    or something else.

    It is something for however supports your OS to look into.

    You can also try the stuff in
    https://tanelpoder.com/posts/11m-iops-with-10-ssds-on-amd-threadripper-pro-workstation/

    and use
    - sudo psn -G syscall,wchan,kstack | sed 's/->/\n/g'
    - 0x.tools / run_xcpu.sh runs perf record  / sudo perf report -i

    to get thread stack and see exactly where iin the kernel the time is being spent!

    David.

    ------------------------------
    David Williams
    ------------------------------



  • 9.  RE: kernel: INFO: task oninit:1720 blocked for more than 120 seconds.

    IBM Champion
    Posted 25 days ago

    Thanks David,
    customer did not notice any performance degradation after turning DIRECT_IO off - so we'll leave it turned off.
    I'm leaving this discussion to Informix and Red Hat owners (or owner) to deal or not to deal wit it.
    But I'll have this remembered for future use :)

    Ty
    Hrvoje



    ------------------------------
    Hrvoje Zokovic
    ------------------------------