AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
#Power
 View Only
  • 1.  USER DATA I/O ERROR

    Posted Wed May 30, 2007 06:34 AM

    Originally posted by: SystemAdmin


    Hello,
    I have an single lpar on p570 with 5300-05-06. Today i noticed lots of messages in the errpt log that keep on comming every single minute:


    LABEL: J2_USERDATA_EIO
    IDENTIFIER: EA88F829

    Date/Time: Wed May 30 13:25:13 2007
    Sequence Number: 1868
    Machine Id: 00C398DE4C00
    Node Id: miranda
    Class: O
    Type: INFO
    Resource Name: SYSJ2

    Description
    USER DATA I/O ERROR

    Probable Causes
    ADAPTER HARDWARE OR MICROCODE
    DISK DRIVE HARDWARE OR MICROCODE
    SOFTWARE DEVICE DRIVER
    STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED

    Recommended Actions
    CHECK CABLES AND THEIR CONNECTIONS
    INSTALL LATEST ADAPTER AND DRIVE MICROCODE
    INSTALL LATEST STORAGE DEVICE DRIVERS
    IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

    Detail Data
    JFS2 MAJOR/MINOR DEVICE NUMBER
    000A 0006
    FILE SYSTEM DEVICE AND MOUNT POINT
    /dev/hd9var, /var
    The OS is working normaly haven't crashed or anything it's just those messages, nothing in the config of the lpar or the OS have been changed in the last week or so. Only change to that box was an upgrade of the FW to:
    Platform Firmware level: SF240_299
    Firmware Version: IBM,SF240_299

    Without shutting down the lpar and activating it again. lsvg rootvg looks OK, no stale PVs or anything. No PTFs or APARs applied to the OS after the SP6 of TL5. Anyone has any idea what this could be? Searched google without much luck. Thanks in advance.
    #AIX-Forum


  • 2.  Re: USER DATA I/O ERROR

    Posted Thu May 31, 2007 12:10 AM

    Originally posted by: SystemAdmin


    OK so here is what happend, looks like the var lvol got corrupted and you could touch a file but one wasn't able to fill it in with some data and after running fsck on mounted FS (i know it's not a good idea to run it on mounted FS, the "man" says that fsck does a read-only check on mounted FS and since that fsck was a blind shot (call me a AIX noob if you will) since i had no ideas what this could be) i got:

    Block allocation map is corrupt (NOT FIXED)
    Block allocation map is corrupt

    Guess that was the problem, booting in maint mode and running fsck again fixed it and now i don't have any of the previous errors. Hope this is helpfull to someone :).
    #AIX-Forum


  • 3.  Re: USER DATA I/O ERROR

    Posted Mon October 19, 2009 10:53 AM

    Originally posted by: KrisB


    I know this is an old post but it came up in a search for me this weekend. I had a similar problem and didn't find much either. My oracle database stopped being able to write to the archive log directory and reported the following errors.

    Sun Oct 18 05:47:36 2009
    Errors in file /u02/oracle/admin/prod/bdump/prod1/prod1_arc0_807124.trc:
    ORA-19502: write error on file "/archprod1/arch/prod1/prod_redo1_79236_622224263.arc", blockno 4097 (blocksize=512)
    ORA-27063: number of bytes read/written is incorrect
    IBM AIX RISC System/6000 Error: 28: No space left on device
    Additional information: -1
    Additional information: 1048576

    Checking the space on the filesystem we had plenty of space. We pointed the archive logs to another directory, I unmounted the filesystem and ran fsck.

    The current volume is: /dev/prodarch01lv
    File system is currently mounted.
    Primary superblock is valid.
    fsck: Performing read-only processing does not produce dependable results.
    • Phase 1 - Initial inode scan
    Inode 4240 has bad size (NOT FIXED)
    Cannot repair inode 4240 (NOT RELEASED)
    Cannot repair inode 4397 (NOT RELEASED)
    • Phase 2 - Process remaining directories
    • Phase 3 - Process remaining files
    • Phase 4 - Check inode allocation map
    File system inode map is corrupt (NOT FIXED)
    • Phase 5 - Check block allocation map
    Block allocation map is corrupt (NOT FIXED)
    Block allocation map is corrupt
    File system is currently mounted.
    fsck: Performing read-only processing does not produce dependable results.

    I made a backup copy of the existing logs in this fileystem and then ran fsck and let it fix errors.

    The current volume is: /dev/prodarch01lv
    Primary superblock is valid.
    J2_LOGREDO:log redo processing for /dev/prodarch01lv
    Primary superblock is valid.
    • Phase 1 - Initial inode scan
    Inode 4240 has bad size; FIX? y
    Cannot repair inode 4397; RELEASE? y
    Superblock marked dirty because repairs are about to be written.
    • Phase 2 - Process remaining directories
    Inode 4397 is linked as: /arch/prod1/prod_redo1_79216_622224263.arc
    Directory inode 4128 has an invalid reference to inode 4397 in entry prod_redo1_79216_622224263.arc; REMOVE? y
    • Phase 3 - Process remaining files
    • Phase 4 - Check and repair inode allocation map
    File system inode map is corrupt; FIX? y
    • Phase 5 - Check and repair block allocation map
    Block allocation map is corrupt; FIX? y
    File system is clean.
    Superblock is marked dirty; FIX? y
    All observed inconsistencies have been repaired.

    ran fsck again....

    The current volume is: /dev/prodarch01lv
    Primary superblock is valid.
    J2_LOGREDO:log redo processing for /dev/prodarch01lv
    Primary superblock is valid.
    • Phase 1 - Initial inode scan
    • Phase 2 - Process remaining directories
    • Phase 3 - Process remaining files
    • Phase 4 - Check and repair inode allocation map
    • Phase 5 - Check and repair block allocation map
    File system is clean.

    I tried copying back some of the old files to see if I continued to get errors in the errpt. All seemed fine so we pointed the logs back to this filesystem and so far no more issues.

    Logged a call with support today to try to find out what could have caused this but support said since it's fixed they have no way of knowing what could have caused it. They say it could be 1 of 3 things:

    1. heavy I/O
    2. filesystem corruption
    3. loss of communication between node and storage

    errpt didn't have anything to suggest #3 and my SVC and SAN reported no problems to the effect. About 3hrs earlier we had upgraded to TL9 SP4 using multibos my gut tells me something there may have caused corruption but since I have no snap so I can't be sure. Anyway posting this in case it helps someone else in the future as this gentleman's post helped me.
    #AIX-Forum


  • 4.  Re: USER DATA I/O ERROR

    Posted Sat January 17, 2015 06:23 PM

    Originally posted by: drashan_433


    Hi All,

    I also Faced the same issue and below are errors from errpt.

     

    We have VIO machine and running two Oracle Database lpars on top of that.

    abnormally going in to down state by showing with below errors .

     Even after starting the database and listener still not able to access unless until either reboot of lpar or unmounting of file systems then ran fsck and then remount it.

    If anyone having any idea on this please suggest.

     

    -------------------------------------------------------------------

    LABEL:          J2_USERDATA_EIO
    IDENTIFIER:     EA88F829

    Date/Time:       Wed 14 Jan 16:01:20 2015
    Sequence Number: 89632
    Machine Id:      00F689344C00
    Node Id:         xxxxxxDBA01
    Class:           O
    Type:            INFO
    WPAR:            Global
    Resource Name:   SYSJ2          

    Description
    USER DATA I/O ERROR

    Probable Causes
    ADAPTER HARDWARE OR MICROCODE
    DISK DRIVE HARDWARE OR MICROCODE
    SOFTWARE DEVICE DRIVER
    STORAGE CABLE LOOSE, DEFECTIVE, OR UNTERMINATED

            Recommended Actions
            CHECK CABLES AND THEIR CONNECTIONS
            INSTALL LATEST ADAPTER AND DRIVE MICROCODE
            INSTALL LATEST STORAGE DEVICE DRIVERS
            IF PROBLEM PERSISTS, CONTACT APPROPRIATE SERVICE REPRESENTATIVE

    Detail Data
    JFS2 MAJOR/MINOR DEVICE NUMBER
    000A 0006
    FILE SYSTEM DEVICE AND MOUNT POINT
    /dev/hd9var, /var

    ---------------------------------------------------------------------------
    LABEL:          LVM_IO_FAIL
    IDENTIFIER:     E86653C3

    Date/Time:       Wed 14 Jan 16:01:20 2015
    Sequence Number: 89694
    Machine Id:      00F689344C00
    Node Id:         xxxxxDBA01
    Class:           H
    Type:            PERM
    WPAR:            Global
    Resource Name:   LVDD           
    Resource Class:  NONE
    Resource Type:   NONE
    Location:       

    Description
    I/O ERROR DETECTED BY LVM

    Probable Causes
    POWER, DRIVE, ADAPTER, OR CABLE FAILURE

            Recommended Actions
            RUN DIAGNOSTICS AGAINST THE FAILING DEVICE

    Detail Data
    PHYSICAL VOLUME DEVICE MAJOR/MINOR
    8000 0011 0000 0005
    ERROR CODE AS DEFINED IN sys/errno.h
               5
    BLOCK NUMBER
                  22007184
    LOGICAL VOLUME DEVICE MAJOR/MINOR
    8000 0023 0000 0001
    PHYSICAL BUFFER TRANSACTION TIME
                         0
    RESIDUAL COUNT
                      4096
    NUMBER OF BLOCKS
                      4096
    I/O TYPE
    USER DATA    
    SENSE DATA
    0000 0000 0000 A7E6 00F6 CDF9 0000 4C00 0000 0132 6489 73C9 00F6 CDF9 60D4 4AC9
    0000 0000 0000 0000
    ---------------------------------------------------------------------------

    Database logs:

     

    ORA-27072: File I/O error
    IBM AIX RISC System/6000 Error: 5: I/O error
    Additional information: 8
    Additional information: 3
    Errors in file /oracle/11g/app/oracle/diag/diag/rdbms/nmpp01a/nmpp01a/trace/nmpp01a_lgwr_9764906.trc:
    ORA-07445: exception encountered: core dump [_sigsetmask()+212] [SIGBUS] [ADDR:0x103527120] [PC:0x9000000007A8214] [unknown code] []
    ORA-00312: online log 3 thread 1: '/oradata/data2/nmpp01a/onlinelog/log3b_nmpp01a.rdo'
    ORA-27072: File I/O error
    IBM AIX RISC System/6000 Error: 5: I/O error
    Additional information: 8
    Additional information: 20483
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Use ADRCI or Support Workbench to package the incident.
    See Note 411.1 at My Oracle Support for error and packaging details.
    Wed Jan 14 16:01:27 2015
    Dumping diagnostic data in directory=[cdmp_20150114160127], requested by (instance=1, osid=18481358 (MMON)), summary=[abnormal process termination].
    Wed Jan 14 16:01:29 2015
    PMON (ospid: 6291684): terminating the instance due to error 470
    System state dump requested by (instance=1, osid=6291684 (PMON)), summary=[abnormal instance termination].
    System State dumped to trace file /oracle/11g/app/oracle/diag/diag/rdbms/nmpp01a/nmpp01a/trace/nmpp01a_diag_9109600.trc
    Dumping diagnostic data in directory=[cdmp_20150114160130], requested by (instance=1, osid=6291684 (PMON)), summary=[abnormal instance termination].
    Termination issued to instance processes. Waiting for the processes to exit
    Wed Jan 14 16:01:42 2015
    Instance termination failed to kill one or more processes
    Instance terminated by PMON, pid = 6291684

     

     

    ============================================================

     

    This is issue has happened 4th time within 3 months.

    when we went in to /tmp directory try to create any file not able to create seems like inode full.

    un-mounting the file system and run fsck and then mounted then it is working.

     

    FSCK output as below:

     

    The current volume is: /dev/oracle11g-lv
    Primary superblock is valid.
    *** Phase 1 - Initial inode scan
    *** Phase 2 - Process remaining directories
    *** Phase 3 - Process remaining files
    *** Phase 4 - Check and repair inode allocation map
    File system inode map is corrupt (FIXED)
    Superblock marked dirty because repairs are about to be written.
    *** Phase 5 - Check and repair block allocation map
    Block allocation map is corrupt (FIXED)
    File system is clean.
    Superblock is marked dirty (FIXED)
    All observed inconsistencies have been repaired.

     

    If any one having any idea about

    Thanks

     


    #AIX-Forum