AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.

 View Only
  • 1.  HEA SOFTWARE ERROR

    Posted Thu January 07, 2010 09:50 AM

    Originally posted by: diesan


    I´m having the problem listed below.

    My system is up to date with AIX 5.3 TL11 (5300-11-01-0944) and i have already updated my microcode to the recommended version by the vendor.
    If you want we can use spanish for this post.

    In addition to this problem when the error shows up, i´m experiencing connectivity issues like loosing connection to the server but still pinging from the outside network.

    LABEL: HEA_INFO
    IDENTIFIER: 0867107D

    Date/Time: Wed Jan 6 11:03:05 GRNLNDST 2010
    Sequence Number: 150
    Machine Id: 00055A7AD400
    Node Id: mglap002
    Class: S
    Type: INFO
    Resource Name: ent0

    Description
    HEA SOFTWARE ERROR

    Probable Causes
    OUT OF RESOURCES
    LOADABLE SOFTWARE MODULE
    OPERATING SYSTEM
    APPLICATION PROGRAM

    User Causes
    INSUFFICIENT MEMORY
    RESOURCE NOT AVAILABLE

    Recommended Actions
    PERFORM PROBLEM RECOVERY PROCEDURES
    RESTART PROGRAM

    Failure Causes
    SOFTWARE PROGRAM
    LOADABLE SOFTWARE MODULE

    Recommended Actions
    PERFORM PROBLEM RECOVERY PROCEDURES
    RESTART PROGRAM

    Detail Data
    FILE NAME
    line: 477 file: hea_intr.c
    SENSE DATA
    0000 0001 0000 0001 0000 0040 0061 081A 0000 0004 0000 0000 0000 0000 0001 8000
    0000 0300 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0A1C 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000
    0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000


  • 2.  Re: HEA SOFTWARE ERROR

    Posted Mon January 11, 2010 10:36 AM

    Originally posted by: SystemAdmin


    Hi,
    The errpt has the answer - insufficient Memory.
    Can't remember but it was either partition mobility or HEA that the server has to have minimum of 768MB. If it is more than 768 MB then try to add another 256 MB.

    On note of HEA. PLEASE DONOT ASSIGN THE HEA TO VIOSERVER. The problem is if the HEA card fails then there is no way that you can remove it from profile. The HEA failed on one of our 570 MMA, and you can't remove it dynamically and if you bring down the server, and try to remove it from profile and activate the server, then it gives a error <location code of HEA> is not responding or not found ( in those lines) the work around is to create a new partition profile, ( you can' edit the existing one) and then activate the VIOSERVER. Now this is vioserver, if you don't have redundant vioserver and production running on clients, then you are in bad shape.

    Hence we added another IO drawer with 4 port NICs and are using it. and not using HEA at all. May be down the road, IBM will come up with some fix for that.

    Regards
    Najam Kazi


  • 3.  Re: HEA SOFTWARE ERROR

    Posted Tue January 12, 2010 06:36 AM

    Originally posted by: diesan


    Najam,

    I think it isn't a memory issue, because physical memory is 16 Gb. and paging space is the same size.
    The peak of the server is 2 Gb. of memory usage, it has only 1 oracle DB and 1 application server.
    The server is a stand alone machine, and its mounted in a BladeCenter S. Is the only blade in that chassis.
    After loosing my connection to the server, at the dumb console the server runs ok, and without an abnormal memory usage.
    The ethernet adapter is only one.
    I think its a sysplanar error, i've found a link with this error.
    http://publib.boulder.ibm.com/infocenter/bladectr/documentation/index.jsp?topic=/com.ibm.bladecenter.js22.doc/dw1fx_r_errorcodes_ba.html (look for BA154010)

    I´ve been talking with the technical support here and they are going to change the system board.

    Diego


  • 4.  Re: HEA SOFTWARE ERROR

    Posted Tue January 12, 2010 10:05 AM

    Originally posted by: Najam Kazi


    Ok so you were hit by same brick. Changing HEA requires the entire CEC to be offline since HEA is designed as a part of system planar. For 570 MMA they had to take out the entire enclosure. Downtime for entire rack (CEC) should be around hour and half plus the app/dbb coming down and up. We are not using HEA anymore, coz if any server crashes with HEA gone bad, then the lpar will not activate until a new profile is created (you can't copy existing and edit it) and activated.

    Good luck.
    Najam


  • 5.  Re: HEA SOFTWARE ERROR

    Posted Tue January 12, 2010 10:52 AM

    Originally posted by: diesan


    In this particular case they will turn off the blade and change the board.
    You have to shutdown the server (apps and db included), replace the system board, and bring up everything (OS, DB, APPS).

    I will tell you the result of this with the new board working.

    Thx

    Diego


  • 6.  Re: HEA SOFTWARE ERROR

    Posted Mon January 11, 2010 11:11 AM

    Originally posted by: The_Doctor


    Funny thing, I have a JS21 Blade experiencing the same HEA errors.

    I'm not convinced "out of memory" is the root cause, since this VIO Server has 1.5GB and only a single AIX Client..... but I'm still working the problem so the jury is still out.

    You don't mention you're running VIOS so I assume you are just a stand-alone AIX system & thus our configurations are substantially different.

    FWIW, briefly my setup is:

    1. JS21 Blade
    2. VIOS 2.1 FP-21 (with 1.5GB) but just updated to FP-22 to see if FP-22 fixes the problem. Testing is underway.
    3. AIX 6.1 Client (with 4GB)

    Like I said, I'm still working the problem, but if I find anything definitive I'll let you know.


  • 7.  Re: HEA SOFTWARE ERROR

    Posted Mon January 11, 2010 08:50 PM

    Originally posted by: The_Doctor


    ooops, sorry for the typo.... the Blade is a JS12