AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
  • 1.  ksh command are mysteriously killed

    Posted Sun May 08, 2011 11:31 AM

    Originally posted by: loani


    Hi,

    We have realy strange phenomenon on our aix 5.3.12 box.
    We have a ksh script containing simple commands like 'ls', 'mv' etc. realy nothing complicated.
    When this script is runing in the background, sometimes (not always) some command are mysteriously getting killed, what we get in the log file is a message like:
    test.sh 20: 123456 Killed (where line 20 is: 'ls abc.txt')
    When in the foreground it's never happend.
    We cannont figure out why the command is killed nor who is sending the signal.
    We suspected that a lack of resorces is the cause, but the machine has to much of everything (cpu,memory,paging etc).
    This box is one of 5 LPARS on a machine but it has dedicated resources.
    We tried the same thing on a separeted machine (no LPARS) and it always run ok.
    As a measure of despair, we changed the user stack limit of stack_hard from unlimited (-1) to the default value, and the frequency of the killing is much less (still happening from time to time)
    Does anyone have a clue ?
    #AIX-Forum


  • 2.  Re: ksh command are mysteriously killed

    Posted Tue May 10, 2011 08:45 AM

    Originally posted by: SystemAdmin


    Hi, loani,

    this could be a short-time peak resource consumption. I vote for it due to the fact behaviour's been changed after you lowed the limits. So it seems to me the process that eats all the memory began to die earlier.

    Try to gather statistics with vmstat\nmon.

    Regards,
    AM
    #AIX-Forum


  • 3.  Re: ksh command are mysteriously killed

    Posted Tue May 10, 2011 04:38 PM

    Originally posted by: Casey_B


    The only type of resource consumption that would cause a kill signal to be sent to a process is lack of paging space.

    This would also cause more severe effects on the system besides your script, including a PGSP_KILL in the error log.

    I was going to suggest nohup, but I don't think a hup signal will cause the log you are seeing. A sigkill had to have been
    sent to the ksh command that was running.

    Look for something performing a kill, or umount -f on that LPAR.

    Hope this helps,
    Casey
    #AIX-Forum