AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
Expand all | Collapse all

AIX kernel crash due to user space process

  • 1.  AIX kernel crash due to user space process

    Posted Sat November 03, 2012 05:48 AM

    Originally posted by: SystemAdmin


    Hello,

    AIX server 6.1 crashed with below stack trace:

    =====================================================================
    -bash-3.2# kdb vmcore.22
    vmcore.22 mapped from @ 700000000000000 to @ 700000058c3836a
    START END <name>
    0000000000001000 0000000004090000 start+000FD8
    F00000002FF47600 F00000002FFDF9C0 __ublock+000000
    000000002FF22FF4 000000002FF22FF8 environ+000000
    000000002FF22FF8 000000002FF22FFC errno+000000
    F1000F0A00000000 F1000F0A10000000 pvproc+000000
    F1000F0A10000000 F1000F0A18000000 pvthread+000000
    Dump analysis on CHRP_SMP_PCI POWER_PC POWER_5 machine with 2 available CPU(s) (64-bit registers)
    Processing symbol table...
    .......................done
    read vscsi_scsi_ptrs OK, ptr = 0x0
    (1)> stat
    SYSTEM_CONFIGURATION:
    CHRP_SMP_PCI POWER_PC POWER_5 machine with 2 available CPU(s) (64-bit registers)

    SYSTEM STATUS:
    sysname... AIX
    nodename.. aix112
    release... 1
    version... 6
    build date Sep 29 2011
    build time 17:43:32
    label..... 1139A_61Q
    machine... 00CD159C4C00
    nid....... CD159C4C
    time of crash: Thu Nov 1 23:28:45 2012
    age of system: 1 day, 11 hr., 19 min., 13 sec.
    xmalloc debug: enabled
    FRRs active... 0
    FRRs started.. 0

    CRASH INFORMATION:
    CPU 1 CSA F000000030AC3600 at time of crash, error code for LEDs: 70000000
    pvthread+042E00 STACK:
    0001BF20abend_trap+000000 ()
    000DEC60thread_terminate+000860 ()
    000DE038thread_terminate_unlock+000018 (??)
    00003850ovlya_addr_sc_flih_main+000130 ()
    kdb_get_virtual_memory no real storage @ 11196E6E0
    900000000687D1C0900000000687D1C ()
    kdb_read_mem no real storage @ FFFFFFFFFFF6680

    (1)> status
    CPU TID TSLOT PID PSLOT PROC_NAME
    0 20005 2 20004 2 wait
    1 42E00AF 1070 1170094 279 s2

    =====================================================================

    If we examine thread 1070 if s2 process, we get below stack:

    (1)> sw 1070
    Switch to initial thread: <pvthread+042E00>

    (1)> f
    pvthread+042E00 STACK:
    0001BF20abend_trap+000000 ()
    000DEC60thread_terminate+000860 ()
    000DE038thread_terminate_unlock+000018 (??)
    00003850ovlya_addr_sc_flih_main+000130 ()
    kdb_get_virtual_memory no real storage @ 11196E6E0
    900000000687D1C0900000000687D1C ()
    kdb_read_mem no real storage @ FFFFFFFFFFF9860

    (1)>

    We are not able to debug further as how a user space process "s2"
    triggers a kernel crash. Note that s2 process has around 110 threads.
    Checking on internet gave below link, with similar stack trace:
    http://www-01.ibm.com/support/docview.wss?uid=isg1IZ89428
    But it does not mention what is root cause for issue and the fix done.

    Any suggestions to move forward ?

    Thanks and Regards,
    Chintea
    #AIX-Forum


  • 2.  Re: AIX kernel crash due to user space process

    Posted Sun November 04, 2012 07:29 PM

    Originally posted by: dukessd


    Um, have you got that APAR instaled?

    IZ89428 is an APAR number, it may be a different number on your system depending on your AIX version and release.

    The fix is in the fileset near the bottom of the page: devices.vtdev.scsi.rte.

    Instfix can help you find the installed APARs and lslpp can help you find the installed filesets and levels.

    HTH.
    #AIX-Forum


  • 3.  Re: AIX kernel crash due to user space process

    Posted Mon November 05, 2012 09:28 AM

    Originally posted by: SystemAdmin


    Hello dukessd,

    Thank you for the suggestion. From the APAR IZ89428, it mentions that fix is in
    "vio_daemon" code, which is not at all installed in our aix server.
    Also from the crash dump, the process "s2" is causing the crash.

    We are suspecting issue should be fixed from "s2" process.

    Thanks and Regards,
    chintea.
    #AIX-Forum