AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only
  • 1.  Kernel Panic in get_from_list_excp + 000014()

    Posted Mon July 06, 2009 04:15 AM

    Originally posted by: SystemAdmin


    Hi,
    I have a kernel driver which under stress condition panics with the stack trace attached with the message. After analyzing the crash dump I am not able to make out why it crashed. Can anybody provide some pointer about the way I should analyze this crash?

    Thanks,
    Amit


    Stack Trace

    0004DCF4get_from_list_excp+000014 ()
    003D9AC8xmalloc_fastpath+00026C (??, ??, ??, ??)
    0454E124_ddk_qmem_alloc+000048 (??, ??)
    0454E370ddk_qmem_alloc_tagged+000020 (??, ??)
    0454FB78_ddk_qstr_dup_multi_sz_41_1+000044 (??, ??, ??, ??)
    04558504ddk_get_param_multi_sz_volatile_alloc+00011C (??, ??, ??)
    0458159Cddk_ctl_check_file_pass_db+00004C (??, ??)
    045809B0ddk_ctl_need_file_pass+000020 (??)
    045831DCddk_ctl_process_file_chng+0001D0 (??, ??, ??, ??, ??, ??, ??, ??)


    kdb stat output


    SYSTEM_CONFIGURATION:
    CHRP_SMP_PCI POWER_PC POWER_5 machine with 8 available CPU(s) (64-bit registers)

    SYSTEM STATUS:
    sysname... AIX
    nodename.. lparaix1
    release... 3
    version... 5
    build date May 12 2008
    build time 23:42:11
    label..... 0820A_53N
    machine... 00023B6ED700
    nid....... 023B6ED7
    time of crash: Tue Jun 16 02:44:43 2009
    age of system: 10 hr., 30 min., 0 sec.
    xmalloc debug: enabled
    Debug kernel error message: No debug cause was specified.

    CRASH INFORMATION:
    CPU 4 CSA F00000002FF47600 at time of crash, error code for LEDs: 70000000


    Output of errpt -a

    LABEL: PROGRAM_INT
    IDENTIFIER: DD11B4AF

    Date/Time: Tue Jun 16 02:47:43 IST 2009
    Sequence Number: 1261
    Machine Id: 00023B6ED700
    Node Id: lparaix1
    Class: S
    Type: PERM
    Resource Name: SYSPROC

    Description
    PROGRAM INTERRUPT

    Probable Causes
    SOFTWARE PROGRAM

    Failure Causes
    SOFTWARE PROGRAM

    Recommended Actions
    IF PROBLEM PERSISTS THEN DO THE FOLLOWING
    CONTACT APPROPRIATE SERVICE REPRESENTATIVE

    Detail Data
    SEGMENT REGISTER, SEGREG
    0000 7FFF FFFF D000
    MACHINE STATUS SAVE/RESTORE REGISTER 0
    0000 0000 0004 D41C
    MACHINE STATUS SAVE/RESTORE REGISTER 1
    0000 0000 0002 0000
    MACHINE STATE REGISTER, MSR
    8000 0000 0002 90B2


    Operating system Information
    AIX 5.3 64Bit
    Output of oslevel -qr

    5300-10
    5300-09
    5300-08
    5300-07
    5300-06
    5300-05
    5300-04
    5300-03
    5300-02
    5300-01
    53100-00


    Output of oslevel -g

    oslevel -g
    Fileset Actual Level Maintenance Level

    bos.rte 5.3.10.0 5.3.0.0


    Output of oslevel -s

    5300-10-01-0921


    output of dc get_from_list_excp

    (1)> dc get_from_list_excp 10
    get_from_list_excp+000000 cmpdi cr0,r8,0
    get_from_list_excp+000004 beq cr0.eq,<get_from_list_excp+000010>
    get_from_list_excp+000008 cmpd cr0,r9,r5
    get_from_list_excp+00000C bne cr0.eq,<get_from_list_excp+000018>
    get_from_list_excp+000010 addi r8,r8,1
    get_from_list_excp+000014 tweqi r8,8
    get_from_list_excp+000018 mr r9,r5
    get_from_list_excp+00001C mr r3,r7
    get_from_list_excp+000020 mfsprg r12,SPRG1
    #AIX-Forum


  • 2.  Re: Kernel Panic in get_from_list_excp + 000014()

    Posted Wed July 08, 2009 09:00 AM

    Originally posted by: l2abe


    Notice that we add one to r8 and we trap (assert) if that's euqal 8 so, the system failed to load a page from memory for eight times and after that it is instructed to crash.
    ...
    get_from_list_excp+000010 addi r8,r8,1
    get_from_list_excp+000014 tweqi r8,8
    ...

    This is a memory corruption so you need to enable MODS on your system so to be able to understand who has freed/manipulated the memory page in failure.
    To enable MODS:
    bosdebug -M
    bosboot -ad /dev/ipldevice
    shutdown -Fr

    A reboot of the server is necessary to fully enable MODS.

    If you are able to replicate the problm and you don't know how to debug the dump obtained, if you are entitled, I strongly recommend you to involve the IBM AIX support for your country.
    #AIX-Forum


  • 3.  Re: Kernel Panic in get_from_list_excp + 000014()

    Posted Wed July 08, 2009 09:44 AM

    Originally posted by: SystemAdmin


    Thanks for your help Alberto.

    When the system crashed MODS was enabled. If you look at the stat output
    'xmalloc debug: enabled" shows that MODS was enabled.

    Running xmalloc (in kdb) does not provide any useful information

    (1)> xmalloc
    Debug kernel error message: No debug cause was specified.
    No default address could be determined.
    #AIX-Forum


  • 4.  Re: Kernel Panic in get_from_list_excp + 000014()

    Posted Thu June 09, 2011 10:56 AM

    Originally posted by: Dhilk


    is there any soultion or fix for this problem?

    we have the same error in one of the AIX 6100-05 LPAR.
    #AIX-Forum