AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
Expand all | Collapse all

Not able to kill commands even on kill -9

  • 1.  Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 04:51 AM

    Originally posted by: SystemAdmin


    Hi,

    I executed some commands on the file /proc/335942/fd/3. Apart from consuming high CPU usage, these commands seems to have turned immortal and now I am unable to kill these commands even after giving SIGKILL. Please have a look at the output below:
    
    bash-2.05b# ps -ef | grep cksum root 815282 790642  22 22:22:59  pts/8 17:13 cksum /proc/335942/fd/3 root 938084      1  22 22:22:10      - 17:29 cksum /proc/335942/fd/3 bash-2.05b# kill -9 815282 bash-2.05b# ps -ef | grep cksum root 815282 790642  27 22:22:59  pts/8 17:16 cksum /proc/335942/fd/3 root 938084      1  29 22:22:10      - 17:31 cksum /proc/335942/fd/3 bash-2.05b# kill -9 815282 bash-2.05b# kill -9 815282 bash-2.05b# kill -9 815282 bash-2.05b# ps -ef | grep cksum root 758000 770138   0 23:38:30 pts/10  0:00 grep cksum root 815282 790642  20 22:22:59  pts/8 17:18 cksum /proc/335942/fd/3 root 938084      1  20 22:22:10      - 17:33 cksum /proc/335942/fd/3
    


    On further analysis the process owning the file-descriptor turned out to be db2wdog.

    
    bash-2.05b# ps -ef | grep 33592 root 745584 770138   0 23:40:44 pts/10  0:00 grep 33592 bash-2.05b# ps -ef | grep 335942 ldapdb2 307386 335942   0   Feb 07      -  0:00 db2sysc 0 root 335942      1   0   Feb 07      -  0:00 db2wdog 0 root 815282 790642  34 22:22:59  pts/8 17:45 cksum /proc/335942/fd/3 root 933968 774358  37 23:18:29  pts/9  4:19 cat /proc/335942/fd/3 root 938084      1  33 22:22:10      - 18:00 cksum /proc/335942/fd/3 root 950410 831560  33 20:40:54  pts/4 65:37 file /proc/335942/fd/3 root 954572 983058  35 20:39:58  pts/3 66:36 file /proc/335942/fd/3
    


    Now I can understand some of these commands not returning. But why I can't SIGKILL them I can't understand. Could someone kindly throw some light over this.
    thanks & regards,
    #AIX-Forum


  • 2.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 04:53 AM

    Originally posted by: SystemAdmin


    For the record I am using Aix 5.3.
    #AIX-Forum


  • 3.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 05:23 AM

    Originally posted by: SystemAdmin


    Further ls -l on the file gives the following output

    
    -rw-r-----    0 root     dbsysadm    5242684 Feb 07 13:43 3
    


    Thus there are 0 hard-links to the file yet it is 524258 bytes in size.
    Could somebody kindly explain how that is possible?
    thanks & regards,
    #AIX-Forum


  • 4.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 06:24 AM

    Originally posted by: Holgervk


    >But why I can't SIGKILL them I can't understand.
    Signals like SIGKILL (and all others, too) are only delivered when the process runs in usermode.
    As long as it runs/hangs in kernelmode, Signals dont have any effect.
    You can use kdb to see where the process hangs.

    About your question regarding file-size: When a file is openen by a process and then deleted (before being closed), it still will be on disk. It cannot be reopened, however. But the process that had it opened before the file was deleted is still able to read it.
    #AIX-Forum


  • 5.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 06:38 AM

    Originally posted by: SystemAdmin


    Holgervk,

    Since the command is (presumably) in kernel mode and is in hung state, and I presume that it is hung either on open() or read() system call because commands like cat (I will verify though - if possible for me) will do nothing but dump the contents on screen, can we deduce that this is some issue with the kernel?

    Also, regarding file being deleted before being closed:
    Will ls command be able to show that file still? I am not too sure about that. Because as per my understanding the entry of file gets removed from its Directory file as soon as we delete the file. However data can still be accessed by a process through its inode number. Please correct me if I am wrong.

    Also note that the file I am talking about, is not a normal file but a file at the location /proc/335942/fd/3 and thus a file descriptor.

    I really appreciate your input, thanks.
    #AIX-Forum


  • 6.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 07:01 AM

    Originally posted by: Holgervk


    sometimes commands hang in the kernel doing read() or whatever
    often due to a device not responding
    I would not deduct a kernel issue because of this

    when a file is deleted, it only can be acessed anymore by a process already having a filedescriptor for it
    further open() calls will fail

    yes, /proc/335942/fd/3 is a special file, pointing to a filedescriptor. a cat on it IMHO cats the referenced file (maybe even if its already deleted).
    I dont have any idea why your cat-command hangs
    does
    truss -p 815282
    give any output?
    #AIX-Forum


  • 7.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 07:15 AM

    Originally posted by: SystemAdmin


    The reason for my suspecting a kernel issue is the high CPU usage. AFAIK, device not responding may block the system call but not cause high CPU usage. (Sorry I am more of a Linux guy - may be things are different in Aix!)

    
    Name            PID  CPU%  PgSp Owner cksum        938084  15.6   0.1 root file         954572  15.3   0.2 root cksum        815282  15.2   0.1 root cat          933968  15.0   0.1 root file         950410  14.6   0.2 root cat          839898  14.1   0.1 root ibmslapd     290880   1.0  13.7 ldap db2fmcd      450596   0.1   1.0 root
    


    The command truss -p 815282 doesn't give any output other than:
    
    Pstatus: process is not stopped truss: 0915-023 Cannot control process #815282.
    

    and that after I do a ctrl-c. I executed truss as root so permission should not be a problem.

    Further I tried attaching to the same process through gdb and now gdb is hung and unresponsive towards SIGKILL (though no high CPU usage for gdb).
    #AIX-Forum


  • 8.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 07:27 AM

    Originally posted by: Holgervk


    strange thing...

    if possible, restart db2wdog to get the fd closed
    #AIX-Forum


  • 9.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 07:31 AM

    Originally posted by: Holgervk


    I did a cat on /proc/pid/fd/xx on an openend, but deleted file and now have the same issue...
    cat uses 100% of one cpu and does not respond to kill or truss
    obviously one should "cat" in /proc with care...
    #AIX-Forum


  • 10.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 07:53 AM

    Originally posted by: Holgervk


    now after restarting the process that had the open filedescriptor, cat returned/exited.
    so, a restart of db2wdog should solve your problem
    #AIX-Forum


  • 11.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 08:01 AM

    Originally posted by: SystemAdmin


    I killed the process (had to use kill -9) and all but one processes went away :).

    The one that was left was the one I tried attaching gdb with. Even gdb I am not able to kill. And that command continues high CPU usage.

    And strange thing is when I do a ps -ef the process for which file descriptor is present is not there and yet in ls -l I can see the entry in /proc file hierarchy.
    #AIX-Forum


  • 12.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 08:06 AM

    Originally posted by: SystemAdmin


    I killed the process using kill -9, the one that was not showing up in ps -ef command output and now even the left over cat command and gdb have disappeared. These are some strange things!

    However, the problem that I am facing is that I have to run certain commands on each file of the file system and I can't skip on /proc file system either. Right now the best bet seems to be skipping over files that have zero hard links but have some data in them.

    Anyway, thanks Holgervk for your help. Appreciated!
    #AIX-Forum


  • 13.  Re: Not able to kill commands even on kill -9

    Posted Thu October 15, 2009 09:24 AM

    Originally posted by: Montecarlo


    The AIX 5L Performance Tools Handbook (SG246039) states:
    "The /proc file system ... is a pseudo file system that maps processes and kernel data
    structures to corresponding files and contains state information about processes
    and threads in the system."

    See the procflags manpage for a list of valid commands that can be run against the /proc filesystem.

    Running other commands against /proc is a good way to break your system.

    Regards, Simon
    #AIX-Forum


  • 14.  Re: Not able to kill commands even on kill -9

    Posted Fri October 16, 2009 01:45 AM

    Originally posted by: SystemAdmin


    Montecarlo,

    I understand what you are trying to say. But the whole point is, if kernel data structures are represented as files than then the responsibility lies with the kernel to ensure that commands which are "not valid" would fail. For e.g., in the current scenario it is a security risk as an unprivileged user can execute commands which can't be killed via kill -9 by root and take excessive CPU usage. This is certainly IMHO, an issue with the kernel.

    Otherwise I agree "other commands" should not be run against /proc. However open() or read() should fail and not cause process to hang.

    regards,
    #AIX-Forum


  • 15.  Re: Not able to kill commands even on kill -9

    Posted Fri October 16, 2009 03:51 AM

    Originally posted by: Montecarlo


    Ah, I didn't realize you were running commands as an unprivileged user. That does sound like a bug. Maybe you should log a call with IBM.
    Regards, Simon
    #AIX-Forum


  • 16.  Re: Not able to kill commands even on kill -9

    Posted Fri October 16, 2009 04:30 AM

    Originally posted by: SystemAdmin


    Actually, only if the unprivileged user is owner of the process are the file descriptors accessible. That does not mitigate the problem, however. Here is something that I observed (all as unprivileged user):

    1) I did a cat > /tmp/file and let the command run (i.e. don't input EOF character).
    2) Next I did rm /tmp/file
    3) Next I did the following command cat >> /proc/553196/fd/1 (where 553196 is PID of cat in step 1). And cat gets hung with high CPU usage.

    At this point ps -ef command returns only one cat in its listing. Now bash is being shown as using high CPU (in topas) and when I try killing that process even as root nothing happens. However, if the original cat in (step 1) is killed everything returns to "normal."
    #AIX-Forum


  • 17.  Re: Not able to kill commands even on kill -9

    Posted Fri October 16, 2009 03:32 PM

    Originally posted by: shargus


    > 3) Next I did the following command cat >> /proc/553196/fd/1 (where 553196 is PID of cat in step 1). And cat gets hung with high CPU usage.

    Does that mean your "cat" is trying to write to the stdout of another process?
    #AIX-Forum


  • 18.  Re: Not able to kill commands even on kill -9

    Posted Tue October 20, 2009 10:51 AM

    Originally posted by: SystemAdmin


    Yes that is true.
    #AIX-Forum