AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
  • 1.  AIX 5.3 lockup - any ideas on possible causes

    Posted Sun April 12, 2009 12:40 AM

    Originally posted by: SystemAdmin


    Hi,

    We are running multiple AIX 5.3.7 and AS/400 instances on a P6 570 system. On one AIX lpar, We recently upgraded Informix 9.4 to version 11.50, and two days later the whole system locked up. Users were unable to continue work, or logon; but, jobs continued to run from crontabs, and it was also possible to run commands from other lpars using ssh. Tests indicated that CPU usage was 100%, paging zero, and there was no change in memory utilisation. Initially the major users of the CPU were GIL and the database, but after a few minutes these were joined by runaway Korn shells. Using ssh we were able to shutdown the databases and reboot the system but two days later the problem reappeared; so we have since rolled-back the Informix upgade.

    Can anyone suggest why/how a problem would lockup users and prevent logins, but still permit processes to run from remote systems or from crontabs?

    Second question, how does one tell if a system has run out of ports or ptys?

    There was no indication of any problem in the error or system logs. The DBA could not find errors output by the database, and the application itself did not record any errors.

    Any suggestions,
    Spook


  • 2.  Re: AIX 5.3 lockup - any ideas on possible causes

    Posted Sun April 12, 2009 09:42 AM

    Originally posted by: cd3lgado


    Hi

    It's not so difficult. Let's say your processes consume all OS resources, then new process, like those created when you want to log into the system, cannot be processed and a system error appears with amessage similar to the following:

    0403-030 The fork function failed. Too many processes already exist

    Then all process currently in exec will work and even finish with no problem, but you won't be able to open new ones.

    I think this problem has to do with parameter maxuproc, could you please port the output of:

    lsattr -El sys0

    If the maxuproc parameter is set to default value (128) it could be the root cause of your system's problem.

    Hope this helps


  • 3.  Re: AIX 5.3 lockup - any ideas on possible causes

    Posted Thu April 16, 2009 01:02 AM

    Originally posted by: SystemAdmin


    Hi,

    Thanks for the assistance. You are right in that maxuproc is set to 128 on our systems, and for many databases this is insufficient. However, ps listings taken during the incident, do not indicate that we were even close to running out of processes.

    After all users sessions locked up, it was observed that new ssh connections generated the error message:

    Apr 9 14:48:40 hostname auth|security:err|error sshd241388: error: /dev/pts/69: Out of STREAMS resources

    IBM Support have suggested that there is a memory leak in AIX that could cause the observed incident. It is our intention to upgrade to AIX 5.3 TL9 SP2 and see whether we can once again move to Informix 11.50. Fingers crossed.

    Thanks again for your assistance.


  • 4.  Re: AIX 5.3 lockup - any ideas on possible causes

    Posted Thu April 16, 2009 04:04 AM

    Originally posted by: SystemAdmin


    Hi Spook,

    If you have upgraded the informix software and this has changed the aix behavior,
    it is possible that the new informix release have any performance issues. I suggest to read the release notes of the new informix.
    But AIX performance must be revised. Look at the Jaqui's AIX Performance web page
    with AIX 5.3 tunables.
    And finally I agree with you to apply maintenance levels to AIX. Software upgrades often must be accompanied by operating system upgrade.

    Regards.
    Silvia