AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
  • 1.  Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Fri May 09, 2008 10:24 AM

    Originally posted by: SystemAdmin


    Have an AIX 5300-07-01-0748 partion connected to NetApp Storage and logging the following errors every 5 mins. Is anyone else experiencing these errors. We do not see the errors on AIX 5300-05-04-0000.

    1. errpt
    IDENTIFIER TIMESTAMP T C RESOURCE_NAME DESCRIPTION
    D0EAC662 0509091008 T H fcs1 MICROCODE PROGRAM ERROR
    B6DB68E0 0509082808 I O SYSJ2 FILE SYSTEM RECOVERY REQUIRED
    E86653C3 0509082708 P H LVDD I/O ERROR DETECTED BY LVM
    B6267342 0509082708 P H hdisk0 DISK OPERATION ERROR

    1. errpt -a | more
    <hr />
    LABEL: FCS_ERR6
    IDENTIFIER: D0EAC662

    Date/Time: Fri May 9 09:10:36 2008
    Sequence Number: 2287
    Machine Id: 00CECB7E4C00
    Node Id: erpdbsa01g
    Class: H
    Type: TEMP
    Resource Name: fcs1
    Resource Class: adapter
    Resource Type: df1000fd
    Location: U7879.001.DQDTXZX-P1-C3-T2
    VPD:
    Part Number.................03N5029
    EC Level....................A
    Serial Number...............1B7120427E
    Manufacturer................001B
    Device Specific.(CC)........5759
    FRU Number.................. 03N5029
    Device Specific.(ZM)........3
    Network Address.............10000000C962DCD7
    ROS Level and ID............02C82138
    Device Specific.(Z0)........1036406D
    Device Specific.(Z1)........00000000
    Device Specific.(Z2)........00000000
    Device Specific.(Z3)........03000909
    Device Specific.(Z4)........FFC01159
    Device Specific.(Z5)........02C82138
    Device Specific.(Z6)........06C12138
    Device Specific.(Z7)........07C12138
    Device Specific.(Z8)........20000000C962DCD7
    Device Specific.(Z9)........BS2.10X8
    Device Specific.(ZA)........B1F2.10X8
    Device Specific.(ZB)........B2F2.10X8
    Device Specific.(ZC)........00000000

    Description
    MICROCODE PROGRAM ERROR

    Probable Causes
    ADAPTER MICROCODE

    Failure Causes
    ADAPTER MICROCODE

    Recommended Actions
    IF PROBLEM PERSISTS THEN DO THE FOLLOWING
    CONTACT APPROPRIATE SERVICE REPRESENTATIVE

    Detail Data
    SENSE DATA
    0000 0000 0000 0021 0202 003E 0000 0000 0002 0100 0000 C4BF 0000 007C 0000 012C
    0000 0000 0000 0007 0000 0000 0000 0000 0000 0004 0000 0000 0000 0000 0000 0000
    0000 0000 0610 0000 0400 0010 0000 0000 0000 0000 0000 2710 0000 07D0 0000 076C
    0000 0064 0000 000F 2400 0000 0040 8700 0000 0000 0000 0000 0F00 0000 FEFF FF00
    003D BA00 368B 01F0 5100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000<hr />
    <hr />
    LABEL: LVM_IO_FAIL
    IDENTIFIER: E86653C3

    Date/Time: Fri May 9 08:27:49 2008
    Sequence Number: 2267
    Machine Id: 00CECB7E4C00
    Node Id: erpdbsa01g
    Class: H
    Type: PERM
    Resource Name: LVDD
    Resource Class: NONE
    Resource Type: NONE
    Location:

    Description
    I/O ERROR DETECTED BY LVM

    Probable Causes
    POWER, DRIVE, ADAPTER, OR CABLE FAILURE

    Recommended Actions
    RUN DIAGNOSTICS AGAINST THE FAILING DEVICE

    Detail Data
    PHYSICAL VOLUME DEVICE MAJOR/MINOR
    8000 0012 0000 0006
    ERROR CODE AS DEFINED IN sys/errno.h
    16
    BLOCK NUMBER
    11899816
    LOGICAL VOLUME DEVICE MAJOR/MINOR
    8000 000A 0000 0009
    PHYSICAL BUFFER TRANSACTION TIME
    0
    RESIDUAL COUNT
    4096
    NUMBER OF BLOCKS
    4096
    I/O TYPE
    USER DATA
    SENSE DATA
    0000 0000 0000 5AC9 00CE CB7E 0000 4C00 0000 0119 CA77 B50A 00CE CB7E 76C6 679C
    0000 0000 0000 0000<hr />
    <hr />
    LABEL: SC_DISK_ERR2
    IDENTIFIER: B6267342

    Date/Time: Fri May 9 08:27:49 2008
    Sequence Number: 2266
    Machine Id: 00CECB7E4C00
    Node Id: erpdbsa01g
    Class: H
    Type: PERM
    Resource Name: hdisk0
    Resource Class: disk
    Resource Type: NetAppMPIO
    Location: U7879.001.DQDTXZX-P1-C3-T2-W500A0984974967D7-L0
    VPD:
    Manufacturer................NETAPP
    Machine Type and Model......LUN
    ROS Level and ID............0.2
    Serial Number...............HnXQNZHE-IqR
    Device Specific.(Z0)........FAS3070

    Description
    DISK OPERATION ERROR

    Probable Causes
    DASD DEVICE

    Failure Causes
    DISK DRIVE
    DISK DRIVE ELECTRONICS

    Recommended Actions
    PERFORM PROBLEM DETERMINATION PROCEDURES

    Detail Data
    PATH ID
    4
    SENSE DATA
    0A00 2A00 00B5 93A8 0000 0804 0000 0000 0000 0000 0000 0000 0118 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 01B4 000E B900 0000 0005 0000 0000 0000 0000 0000 0083 0000
    0000 003D 0017<hr />
    #AIX-Forum


  • 2.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Fri May 09, 2008 08:02 PM

    Originally posted by: dukessd


    The SC_DISK_ERR2 on hdisk0 is the cause of the problem.
    The sense data shows there is a disk reservation problem.
    Something else has the reserve on the disk so host erpdbsa01g cannot get the IOCB.
    Do you run any SAN disk monitoring software? BMC is famous for taking a reserve and denying LVM access to the disks.
    #AIX-Forum


  • 3.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Mon May 12, 2008 09:34 AM

    Originally posted by: SystemAdmin


    The Storage Admin is verifying the zoning at the NetApps level. The "reserve_policy" for hdisk0 in advertantly got set to "no_reserve", but when I try to change it back to "single_path" I get the following error:

    1. chdev -l hdisk0 -a reserve_policy=single_path -P
    Method error (/usr/lib/methods/chgdisk):
    0514-018 The values specified for the following attributes
    are not valid:

    root@erpdbsa01g:/

    I was thinking there was a special setting to be set for the NetApp's driver, but can not find the command within the NetApp installed driver software to change the settings. I have changed the "reserve_policy" before uing the IBM MPIO driver on a rootvg disk, but had to specific the "-P" qualifer which changes the ODM only and goes into effect upon the next reboot.
    #AIX-Forum


  • 4.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Mon May 12, 2008 03:47 PM

    Originally posted by: jvk


    Two diff. issues
    • FCS_ERR6: fcs1 rejected a cmd due illegal frame
    • SC_DISK_ERR2: reservation conflict on hdisk0 (and then LVM_IO_FAIL due "device busy" condition)

    #AIX-Forum


  • 5.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Mon May 12, 2008 04:22 PM

    Originally posted by: SystemAdmin


    I don't think the following will not work on rootvg to return the reserve_policy to "single_path":

    unmount FS
    varyoffvg VG
    chdev -l hdiskN -a algorithm=fail_over
    chdev -l hdiskN -a reserve_policy=single_path

    but,

    the "-P" option should change the ODM for the next reboot.

    I get the above reference error message when I use use the chdev:

    chdev -l hdisk0 -a reserve_policy=single_path -P
    Method error (/usr/lib/methods/chgdisk):
    0514-018 The values specified for the following attributes
    are not valid:
    #AIX-Forum


  • 6.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Tue May 13, 2008 07:49 AM

    Originally posted by: jvk


    Did you try the cmd like this:
    1. chdev -l hdisk0 -a algorithm=fail_over -a reserve_policy=single_path -P
    2. shutdown -Fr

    #AIX-Forum


  • 7.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Tue May 13, 2008 09:02 AM

    Originally posted by: SystemAdmin


    The below command worked, and it makes sense; if the reserve policy is going to be single_path, then the algorithm must be fail_over.

    chdev -l hdisk0 -a algorithm=fail_over -a reserve_policy=single_path -P

    Unfortunately, it did not resolve my original errors and the SAN Admin says that zoning at the switches is correct.
    #AIX-Forum


  • 8.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Tue May 13, 2008 10:00 AM

    Originally posted by: jvk


    Nobody says that the chdev will resolve your errors and I have no idea why you wanted to change reserve policy and algorithm...

    Reservation conflict is something you need to inv. by yourself. See when does it come, after what action or task... See if it comes every day at the same time, if yes, what is running at that time (not only on this AIX node but also on SAN or some other AIX node who sees the same disks). If LUN is used by this node only, then make sure nobody else is able to see it (use right LUN assignment + zoning). Check SAN monitoring SW, backup SW etc.

    Other error suggests a dirty link on SAN and this needs to be inv. further. From one error only one can't tell much...

    Above are enough hints what you can do by yourself. Much more can be done by IBM support. If you have a contract, open a pmr for this.
    #AIX-Forum


  • 9.  Re: Errors in errpt using NeApp Network Applicance SAN Toolkit and MPIO PCM Kit

    Posted Tue May 13, 2008 10:05 AM

    Originally posted by: SystemAdmin


    Actually, we have found what is causing the errors; we use Veritas CommandCentral Storage(VCC), and it's agent on the AIX hosts start logging these errors when the VCC processes start. We have an incident open with Symantec.

    Thank you for your comments and hints.
    #AIX-Forum