AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only
Expand all | Collapse all

Tracing SC_DISK_ERR4

  • 1.  Tracing SC_DISK_ERR4

    Posted Mon April 29, 2013 04:43 PM

    Originally posted by: matthewpmattson


    Hi all,

    I have some AIX hosts connected via SAN to a 3PAR storage array and while running I/O on the disks I am seeing some TEMP errors in my errpt. See below for full details:

    from the errpt -c command

    DCB47997   0429144313 T H hdisk1         DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk11        DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk11        DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk20        DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk4         DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk15        DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk15        DISK OPERATION ERROR
    DCB47997   0429144313 T H hdisk5         DISK OPERATION ERROR
    DCB47997   0429144413 T H hdisk2         DISK OPERATION ERROR
    DCB47997   0429144413 T H hdisk2         DISK OPERATION ERROR
    DCB47997   0429144413 T H hdisk13        DISK OPERATION ERROR
    DCB47997   0429144413 T H hdisk13        DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk17        DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk17        DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk16        DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk5         DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk5         DISK OPERATION ERROR
    DCB47997   0429144513 T H hdisk2         DISK OPERATION ERROR
    DCB47997   0429144613 T H hdisk1         DISK OPERATION ERROR

    detail of the error:

    ---------------------------------------------------------------------------
    LABEL:          SC_DISK_ERR4
    IDENTIFIER:     DCB47997

    Date/Time:       Sun Apr 28 22:14:51 CDT 2013
    Sequence Number: 33081
    Machine Id:      00F7A89E4C00
    Node Id:         blue7
    Class:           H
    Type:            TEMP
    WPAR:            Global
    Resource Name:   hdisk18
    Resource Class:  disk
    Resource Type:   3PAR_VV_MPIO
    Location:        U8231.E1C.06A89ER-V1-C35-T1-W20210002AC0185E0-L12000000000000

    VPD:
            Manufacturer................3PARdata
            Machine Type and Model......VV
            Serial Number...............C000009900000000

    Description
    DISK OPERATION ERROR

    Probable Causes
    MEDIA
    DASD DEVICE

    User Causes
    MEDIA DEFECTIVE

            Recommended Actions
            FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
            PERFORM PROBLEM DETERMINATION PROCEDURES

    Failure Causes
    MEDIA
    DISK DRIVE

            Recommended Actions
            FOR REMOVABLE MEDIA, CHANGE MEDIA AND RETRY
            PERFORM PROBLEM DETERMINATION PROCEDURES

    Detail Data
    PATH ID
               0
    SENSE DATA
    0A00 2800 0116 7E20 0000 4004 0000 0000 0000 0000 0000 0000 0200 0300 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 C336 0002 5080 0000 0000 0000 0000 0000 0000 0000 0083 0000
    0000 0035 001D

    -------------------------------------------------------------------------------------------------------------

    Can anyone tell from the Sense data codes some hints as to why these are appearing?

    I have looked over the net on the this specific error but it seems there isn't any one cause for this, so I am looking for some ideas on where to start debugging.

    Thanks,

    Matt

     


    #AIX-Forum


  • 2.  Re: Tracing SC_DISK_ERR4

    Posted Mon April 29, 2013 07:40 PM

    Originally posted by: dukessd


    Hi Matt,

    The sense data shows a scsi command timeout.

    The "Fibre Channel Planning and Integration" guide from IBM shows how to understand these errors:

    http://publib.boulder.ibm.com/systems/hardware_docs/pdf/234329.pdf

    Page 89 onwards covers decoding the SC_DISK_ERR events.


    Sense Data Layout

    LL00 CCCC CCCC CCCC CCCC CCCC CCCC CCCC CCCC RRRR RRRR RRRR VVSS AARR DDDD KKDD
    0A00 2800 0116 7E20 0000 4004 0000 0000 0000 0000 0000 0000 0200 0300 0000 0000


    You have vv = 02, AA = 03, which means:


    "Command Timeout. This indicates that the SCSI command did not
    complete within the allowed time. This usually indicates a hardware
    problem related to the SCSI transport layer."

    Check if they are all the same path ID, if so then look at what that path as in common - adapter - switch - storage subsystem port.

    If there are no other errors then it would suggest a problem on the storage subsystem, for some reason that controller / port / lun is not responding in a timely manner.

    As you are using npiv you should also check for adapter and interface errors on the associated VIOS.

    HTH


    #AIX-Forum


  • 3.  Re: Tracing SC_DISK_ERR4

    Posted Tue April 30, 2013 01:06 PM

    Originally posted by: matthewpmattson


    Hi,

    The path ID alternates between 0 and 1. Which I guess would mean commands are timing out down both paths? These are the only errors I am seeing in AIX errpt. I checked the VIOS errlog as you said and I am seeing a few "Misbehaved Virtual FC Client" errors logged. Not sure if this is related or something different but the description reads:

    ---------------------------------------------------------------------------
    LABEL:          VFC_CLIENT_FAILURE
    IDENTIFIER:     88E96781

    Date/Time:       Tue Apr 30 09:37:11 PDT 2013
    Sequence Number: 79
    Machine Id:      00F7A89E4C00
    Node Id:         blue246
    Class:           S
    Type:            TEMP
    WPAR:            Global
    Resource Name:   vfchost1

    Description
    Misbehaved Virtual FC Client

    Probable Causes
    Bad IU, or Protocol Violation

    Failure Causes
    Bad IU, or Protocol Violation

            Recommended Actions
            Remove Virtual FC Client, then Configure the same instance

    Detail Data
    ADDITIONAL INFORMATION
            module: trans_event     rc: 00000000FFFFFFD8    location: 00000514
            data:  1 1 0 0 0
    ---------------------------------------------------------------------------

     

    Matt
     


    #AIX-Forum