AIX Open Source

AIX Open Source

Share your experiences and connect with fellow developers to discover how to build and manage open source software for the AIX operating system

 View Only

rsyslog 8.19.50.1 stops sending messages locally and remotely

  • 1.  rsyslog 8.19.50.1 stops sending messages locally and remotely

    Posted Tue November 01, 2022 03:11 PM
    Hello, I was redirected to this community by a CSP case TS010060253

    The issue we are observing in a customer environment is that rsyslog stops sending messages time to time.

    customer claims that is now happening almost every day.

    in order to make it work again, customer restart the service.

    according to the help I got in the CSP case all the configuration seams to be ok.

    here some info when the rsyslog stops sending messages:

    1 ) state of the process

    root@Boerse_Cl2[/home/pfe] ps -ef | grep rsys

    root 40894750 4915828 56 Oct 10 - 1852:57 /usr/sbin/rsyslogd

    root@Boerse_Cl2[/home/pfe] procstack 40894750

    40894750: /usr/sbin/rsyslogd

    ---------- tid# 50660731 (pthread ID: 1) ----------

    0xd027a00c __fd_select(??, ??, ??, ??, ??) + 0xcc

    0x10000ed4 select(0xe, 0x2ff20b78, 0x0, 0x0, 0x2ff20a70) + 0x34

    0x10003928 wait_timeout() + 0xe8

    0x10003b1c mainloop() + 0x1c

    0x10003e60 main(0x1, 0x2ff22c80) + 0x240

    0x10000190 __start() + 0x68

    ---------- tid# 23922609 (pthread ID: 258) ----------

    0xd0279dd4 __fd_poll(??, ??, ??) + 0xb4

    0xdc2bf484 poll(0x20095018, 0x1, 0xffffffff) + 0x24

    0xdc2c0cd4 runInput(0x200f86a8) + 0x134

    0x10092484 thrdStarter(0x200f86a8) + 0x84

    0xd0577fc4 _pthread_body(??) + 0xe4

    ---------- tid# 119604695 (pthread ID: 515) ----------

    0xd02385b0 send(??, ??, ??, ??) + 0xb0

    0xdc2f2100 Send(0x2199ab98, 0x20344194, 0x2031c688) + 0x40

    0xdc2ea208 Send(0x21991768, 0x20344194, 0x2031c688) + 0x48

    0x10086f14 TCPSendBufUncompressed(0x20344138, 0x20344194, 0x38) + 0xb4

    0x100878c8 TCPSendBuf(0x20344138, 0x20344194, 0x38, 0x1000001) + 0x68

    0x10088d1c commitTransaction(0x20344138, 0x203629b8, 0x1) + 0x11c

    0x10058de0 actionCallCommitTransaction(0x20095028, 0x20086868, 0x203629b8, 0x1) + 0xc0

    0x100591cc doTransaction(0x20095028, 0x20086868, 0x203629b8, 0x1) + 0x8c

    0x100593c0 actionTryCommit(0x20095028, 0x20086868, 0x203629b8, 0x1) + 0x80

    0x10059bbc actionCommit(0x20095028, 0x20086868) + 0x11c

    0x1005bfc0 actionCommitAllDirect(0x20086868) + 0xc0

    0x1002c350 processBatch(0x20086888, 0x20086868) + 0x1b0

    0x10005398 msgConsumer(0x0, 0x20086888, 0x20086868) + 0x58

    0x10066964 ConsumerReg(0x20086268, 0x20086868) + 0x164

    0x10069700 wtiWorker(0x20086868) + 0x160

    0x10068bdc wtpWorker(0x20086868) + 0xfc

    0xd0577fc4 _pthread_body(??) + 0xe4

    2 ) here some traces did to test it:

    root@Boerse_Cl2[/home/pfe] startsrc -s iptrace -a "-a -p 514 /tmp/iptrace2.raw"

    0513-059 The iptrace Subsystem has been started. Subsystem PID is 17891784.

    root@Boerse_Cl2[/home/pfe] trace -an -L500M -T50M -o /tmp/kerneltrace.raw

    root@Boerse_Cl2[/home/pfe] echo "this is a test" | logger -p user.info

    root@Boerse_Cl2[/home/pfe] trcstop

    root@Boerse_Cl2[/home/pfe] stopsrc -s iptrace

    root@Boerse_Cl2[/home/pfe] ls -rtl /tmp/*.raw

    -rw-r--r-- 1 root system 11 Oct 14 08:26 iptrace2.raw

    -rw-rw-rw- 1 root enhrem 527686440 Oct 14 08:27 kerneltrace.raw

    root@Boerse_Cl2[/tmp] date

    Fri Oct 14 08:29:03 CUT 2022

    root@Boerse_Cl2[/tmp] ls -rtl /var/log/rsyslog/127.0.0.1/

    total 32600

    -rw-r--r-- 1 root system 1620 Aug 06 09:55 kern_info.log

    -rw-r--r-- 1 root system 574949 Aug 09 15:01 local2_debug.4.log.gz

    -rw-r--r-- 1 root system 575670 Aug 15 17:56 local2_debug.3.log.gz

    -rw-r--r-- 1 root system 1476 Sep 24 11:10 local1_info.log

    -rw-r--r-- 1 root system 576328 Oct 04 10:26 local2_debug.2.log.gz

    -rw-r--r-- 1 root system 575676 Oct 08 16:31 local2_debug.1.log.gz

    -rw-r--r-- 1 root system 4992 Oct 10 13:38 daemon_info.log

    -rw-r--r-- 1 root system 58704 Oct 13 06:21 local0_info.log

    -rw-r--r-- 1 root system 34219 Oct 13 06:21 local3_info.log

    -rw-r--r-- 1 root system 433858 Oct 13 06:21 local4_info.log

    -rw-r--r-- 1 root system 2430923 Oct 13 06:21 auth_info.log

    -rw-r--r-- 1 root system 486217 Oct 13 06:22 syslog_info.log

    -rw-r--r-- 1 root system 8802240 Oct 13 06:56 local2_debug.log

    -rw-r--r-- 1 root system 2088772 Oct 13 06:56 user_info.log

    Seems it go stuck on the 13th of October.

    the kernel and ip trace is on ecurep : https://ecurep.mainz.de.ibm.com/ae5/#id=TS010816596&path=TS010816596%2Fsupport_files%2F

    I'm not sure everybody here has access to ecurep, but if you need anything else please let me know.

    Best regards!

    ------------------------------
    JOSUE CAZAREZ AGUILAR
    ------------------------------