AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
Expand all | Collapse all

disk media for jfs is bad, open/read/write ops suspend for ever?

  • 1.  disk media for jfs is bad, open/read/write ops suspend for ever?

    Posted Wed December 07, 2005 01:40 AM

    Originally posted by: SystemAdmin


    more than 100 processes did dd in/out files on a disk, when disk I/O failed (in this case, ddstrategy return ENODEV for each reqest), i found more most of the processes were suspending and can not be killed even by "kill -9".

    dump kernel threads' stack and found tree types of stack:
    case 1:
    -begin-----------------------------------------------
    (0)> sw 243;f
    Switch to thread: <pvthread+007980>
    pvthread+007980 STACK:
    001A94D8complex_lock_sleep_ppc+0000D8 (0000380E, 00000002, 000005AC, 2FF3B7E4 ??)
    001AA604lock_write_ppc+000150 (??)
    003D0258jfs_rdwr+0000B4 (??, ??, ??, ??, ??, ??, ??, ??)
    0033C6D0vnop_rdwr+000098 (??, ??, ??, ??, ??, ??, ??, ??)
    00317FB8rwuio+0000D0 (??, ??, ??, ??, ??)
    003181ACrdwr+000134 (??, ??, ??, ??, ??)
    00317960kwrite+0000E0 (00000004, 30000000, 00008000)
    00003A50.sys_call+000000 ()
    D01E1C18write+000198 (??, ??, ??)
    10001F68wcbuf+000038 (??, ??)
    10002F30do_child+000128 ()
    10002910prep_mbuf+000368 ()
    10000EBCmain+000998 (??, ??)
    10000188__start+000088 ()
    -end-for-case-1--------------------------------------

    case 2:
    -begin-----------------------------------------------
    (0)> sw 253;f
    Switch to thread: <pvthread+007E80>
    pvthread+007E80 STACK:
    00031FF8.backt+000000 (00006C78, 2FF3A868 ??)
    00032D88.vcs_movep_excp+000020 (??, ??)
    001229FCvmpcopy+0003F4 (??, ??, ??, ??, ??, ??, ??)
    0012382Cvmfcopyin+000570 (??, ??, ??, ??, ??, ??)
    00122FCCvm_uiomove+000360 (??, ??, ??, ??)
    003CFD6Cwritei+000108 (??, ??, ??, ??, ??)
    003D0694jfs_rdwr+0004F0 (??, ??, ??, ??, ??, ??, ??, ??)
    0033C6D0vnop_rdwr+000098 (??, ??, ??, ??, ??, ??, ??, ??)
    00317FB8rwuio+0000D0 (??, ??, ??, ??, ??)
    003181ACrdwr+000134 (??, ??, ??, ??, ??)
    00317960kwrite+0000E0 (00000004, 40000000, 00008000)
    00003A50.sys_call+000000 ()
    D01E1C18write+000198 (??, ??, ??)
    10001F68wcbuf+000038 (??, ??)
    10002F30do_child+000128 ()
    10002910prep_mbuf+000368 ()
    10000EBCmain+000998 (??, ??)
    10000188__start+000088 ()
    -end-for-case-1--------------------------------------

    case 3:
    -begin-----------------------------------------------
    (0)> sw 255;f
    Switch to thread: <pvthread+007F80>
    pvthread+007F80 STACK:
    001A94D8complex_lock_sleep_ppc+0000D8 (0000380E, 00000002, 00000396, 2FF3B7E4 ??)
    001AA604lock_write_ppc+000150 (??)
    003D27C4jfs_lookup+00016C (??, ??, ??, ??, ??, ??)
    0033CAA4vnop_lookup+000018 (??, ??, ??, ??, ??, ??)
    00316EB4lookuppn+000474 (??, ??, ??, ??, ??, ??)
    0038D2E0openpnp+0000CC (??, ??, ??, ??, ??)
    0038D880openpath+000104 (??, ??, ??, ??, ??, ??, ??)
    0038DA8Ccopen+000178 (??, ??, ??, ??, ??)
    0038D1C0open+00007C (2FF22CDE, 04000000, 00000000)
    00003A50.sys_call+000000 ()
    D01DE108open64+00003C (2FF22CDE, 00000000, 00000000, 00000000,
    0000000A, 60000000, 60002349, 7F7F7F7F)
    100009D8main+0004B4 (??, ??)
    10000188__start+000088 ()
    -end-for-case-3--------------------------------------

    case 4:
    -begin-----------------------------------------------
    (0)> sw 350;f
    Switch to thread: <pvthread+00AF00>
    pvthread+00AF00 STACK:
    00038CCCe_block_thread+000280 ()
    0003935Ce_sleep_thread+000054 (??, ??, ??)
    003DF0ECfifo_read+000090 (??, ??, ??, ??)
    003DF6B8fifo_rdwr+000060 (??, ??, ??, ??, ??, ??, ??, ??)
    0033C6D0vnop_rdwr+000098 (??, ??, ??, ??, ??, ??, ??, ??)
    00317FB8rwuio+0000D0 (??, ??, ??, ??, ??)
    003181ACrdwr+000134 (??, ??, ??, ??, ??)
    00317E90kread+0000E0 (00000000, 20002C32, 00000A16)
    00003A50.sys_call+000000 ()
    D01E32C4read+000198 (??, ??, ??)
    100003ECbread+0000E4 (??, ??)
    10001450umatch+000164 (??, ??, ??, ??, ??, ??)
    10002A44fastexecute+000194 (??)
    10000E68main+000820 (??, ??)
    10000188__start+000088 ()
    -end-for-case-4--------------------------------------

    OS info:
    oslevel -r
    5200-01
    uname -a
    AIX ibm-rs6000 2 5 000FF1DF4C00
    bootinfo -K
    32

    anyone can tell why and how to solve them? thanks.


  • 2.  Re: disk media for jfs is bad, open/read/write ops suspend for ever?

    Posted Wed December 07, 2005 10:08 AM

    Originally posted by: MarkTaylor


    > more than 100 processes did dd in/out files on a
    > disk, when disk I/O failed (in this case, ddstrategy
    > return ENODEV for each reqest), i found more most of
    > the processes were suspending and can not be killed
    > even by "kill -9".
    am i right in reading you post that the disk went bad while you were running dd to the disk hundreds of times ? if so, then the dd is waiting for a write complete from the disk that will never arrive because the disk has an error, the thread are stuck in perotected kernel space and cannot be killed, you will have to reboot your system.

    Rgds
    Mark Taylor


  • 3.  Re: disk media for jfs is bad, open/read/write ops suspend for ever?

    Posted Thu December 08, 2005 04:35 AM

    Originally posted by: SystemAdmin


    jfs DO NOT forwards error to caller when lower disk's ddstrategy returns error? in that(any) case, jfs will wait for I/O completion for ever?