AIX

AIX

Connect with fellow AIX users and experts to gain knowledge, share insights, and solve problems.


#Power
 View Only
  • 1.  A checkstop condition occurred during the BIST

    Posted Wed February 21, 2007 03:17 PM

    Originally posted by: SystemAdmin


    hello,

    I run an AIX 5.1 and the box from time to time hangs with led code 185: A checkstop condition occurred during the BIST.

    What does this mean, and what should I do in order to avoid such events?

    Thank you in advance.

    Regards,
    Z
    #AIX-Forum


  • 2.  Re: A checkstop condition occurred during the BIST

    Posted Thu February 22, 2007 08:25 AM

    Originally posted by: SystemAdmin


    Checkstops are hardware errors usually to do with the CPU. You should deal with this firstly by checking the error log to see if there are any entries corresponding to the checkstop. Then run diag to see if it shows anything. For something like this you'll probably have to run standalone diagnostics as well since the online version can't test everything. When you've gathered whatever information all that turns up your next step will probably be to contact IBM (or whoever does your hardware maintenance) because they may need to do some work on your box.

    HTH

    Jim Lane
    #AIX-Forum


  • 3.  Re: A checkstop condition occurred during the BIST

    Posted Mon March 05, 2007 04:53 PM

    Originally posted by: SystemAdmin


    Jim,

    please could you please provide some commands or step by step instruction what to run what log to check in order to identify the problem.

    (It occurred again now - just after 10 days)

    Thank you in advance.

    Regards,
    Z
    #AIX-Forum


  • 4.  Re: A checkstop condition occurred during the BIST

    Posted Mon March 05, 2007 05:09 PM

    Originally posted by: orphy


    I suggest opening a PMR with IBM. They'll likely ask you to send in a testcase
    and, with that, they usually can determine the problem source. It's likely that
    you may have a piece of hardware that is failing.
    Orphy
    #AIX-Forum


  • 5.  Re: A checkstop condition occurred during the BIST

    Posted Mon March 05, 2007 06:32 PM

    Originally posted by: SystemAdmin


    hello,

    >I suggest opening a PMR with IBM.
    Is that something that I need to purchase? I do not have any support contract with IBM (running on second hand RS/6000 with ebay-ed AIX)

    Is there any do-yourself solution to my problem?

    Thank you in advance.

    Regards,
    Z
    #AIX-Forum


  • 6.  Re: A checkstop condition occurred during the BIST

    Posted Mon March 05, 2007 08:44 PM

    Originally posted by: dukessd


    I guess you'll have to do it the hard way then.
    By the looks of your posts, it only happens once in a while.
    Did the problem happen after you added some hardware?
    Did the problem happen after you upgraded or changed some software?

    BIST is probably failing on a cpu, memory or systemboard problem!

    Do you have a service processor? - check the service processor logs if you do for more information on the problem.

    Try (with the power disconnected) disconnecting any external devices and removing any unused internal devices and adapters.

    Does it still hang at 185?

    If so then there is some thing in the machine that is still not passing the BIST (Built In Self Test).

    Is the system firmware up to date?
    http://www14.software.ibm.com/webapp/set2/firmware/gjsn

    Next you'll have to remove everything not needed to boot!

    Get your self a copy of the diag CD:
    https://www14.software.ibm.com/webapp/set2/sas/f/diags/home.html

    Remove all you can, all bar 1, 2, 3, 4, 8 CPUs - depends on your hardware - DON'T remove CPUs from 7026, 7038, 7040 (and possibly others) - unless you have no where else to go! They are one plug only unless you know the proper proceedure!!!!

    Remove all ethernet and scsi devices, apart from the CD drive..

    Remove all memory possible and then see if it still hangs...

    Add back bits one at a time untill you find the bit causing the hang...

    Hardware docs (power5 - p5xx):
    http://publib.boulder.ibm.com/infocenter/eserver/v1r3s/index.jsp
    (use the search box on the left for your machine type.

    hardware docs (pre-power5 - p6xx and earlier):
    http://publib.boulder.ibm.com/infocenter/pseries/v5r3/index.jsp?topic=/com.ibm.pseries.doc/hardware_docs/selectbysystem.htm

    It could just be a dying adapter or device hanging some important part of the system on a cold boot.

    Being a "home brew" system it could just be a non IBM disk drive / devices is not ready to respond to the self test in time, try adding the CD-ROM, floppy, ethernet to the boot list so they get queried first, this may give the non-IBM devices time to wake up! Unlikely because the BIST shouldn't be too worried about device problems but worth a try...
    #AIX-Forum


  • 7.  Re: A checkstop condition occurred during the BIST

    Posted Tue March 06, 2007 03:31 AM

    Originally posted by: SystemAdmin


    dukessd,

    thank you VERY, VERY much for competent and fast answer.

    Otherwise, this started to occur after HD crash after a power cut.
    I replaced the original IBM disk with a Compaq one and reinstalled the whole box from scratch.

    The box works perfect, can not see any strange behaviour, except this hanging with BIST failure every 10-15 days.
    Is BIST a cron job? Is it possible to disable?

    I'll upgrade the firmware and run the tests and I'll come back soon with the results of your suggested actions.

    Thank you again.

    Regards,
    Z


    #AIX-Forum


  • 8.  Re: A checkstop condition occurred during the BIST

    Posted Tue March 06, 2007 11:03 AM

    Originally posted by: orphy


    BIST is not a cron job. It's a set of tests that the box kicks off to
    self-check the most basic hardware components before starting up the
    boot process. In certain models, you could speed up BIST but I'm not
    aware that you could disable it. I don't think you should disable it
    anyway since if something causes BIST not to pass, you would normally
    want to find out why and try to fix it.
    Orphy
    #AIX-Forum


  • 9.  Re: A checkstop condition occurred during the BIST

    Posted Tue March 13, 2007 06:36 PM

    Originally posted by: SystemAdmin


    Hello,

    I spent lot of time with reading documents, running tests etc, but I found something wired with my system and I would like to ask for confirmation before I do any microcode upgrade.

    My system is:
    bash-3.00# uname -a
    AIX aix 1 5 000641284C00

    bash-3.00# uname -Ml
    104933452 IBM PPS Model 7043 (ED)

    bash-3.00# lscfg -vp | grep ROM
    cd0 04-B0-00-3,0 SCSI Multimedia CD-ROM Drive (650
    bash-3.00# lscfg -vp
    INSTALLED RESOURCE LIST WITH VPD

    The following resources are installed on your machine.

    Model Architecture: rspc
    Model Implementation: Multiple Processor, PCI bus

    sys0 00-00 System Object
    sysplanar0 00-00 System Planar
    bus0 00-00 PCI Bus
    bus2 04-C0 ISA Bus
    siota0 01-A0 Tablet Adapter
    ppa0 01-B0 Standard I/O Parallel Port Adapter
    sa0 01-C0 Standard I/O Serial Port 1
    sa1 01-D0 Standard I/O Serial Port 2
    paud0 01-E0 Ultimedia Integrated Audio
    sioka0 01-F0 Keyboard Adapter
    sioma0 01-G0 Mouse Adapter
    fda0 01-H0 Standard I/O Diskette Adapter
    fd0 01-H0-00-00 Diskette Drive
    pmc0 01-I0 Power Management Controller
    ent0 04-A0 IBM PCI Ethernet Adapter (22100020)

    Network Address.............08005A93E29E
    Displayable Message.........PCI Ethernet Adapter (22100020)

    scsi0 04-B0 Wide SCSI I/O Controller
    hdisk0 04-B0-00-0,0 Other SCSI Disk Drive

    Manufacturer................QUANTUM
    Machine Type and Model......ATLAS V 18 WLS
    ROS Level and ID............30323031
    Serial Number...............14101295
    Device Specific.(Z0)........000003025B00003E
    Device Specific.(Z1)........4119 000225

    cd0 04-B0-00-3,0 SCSI Multimedia CD-ROM Drive (650
    MB)

    Manufacturer................IBM
    Machine Type and Model......CDRM00203
    ROS Level and ID............8B08
    Device Specific.(Z0)........058002028F000018
    Part Number.................73H2600
    EC Level....................D75458A
    FRU Number..................73H2601

    bus1 00-00 PCI Bus
    bl0 04-02 GXT250P Graphics Adapter

    VPD data is not recognizable.

    mem0 00-00 Memory
    proc0 00-00 Processor
    L2cache0 00-00 L2 Cache
    proc1 00-01 Processor


    MDT tool did not found any microcode that should be upgraded :(

    The questions are:
    • where can I get a microcode for QUANTUM hard disks?
    • I have a spare IBM IC36L*D210 disk that needs microcode update. Is this update needed if MDT does not recommend?
    • what is the microcode for my box (Model 7043)? There are listed for 7043-150... etc. But not for only 7043
    • I have found a document that says that 7043 or 7044-270 needs SPH04194 microcode to run AIX 5.3. Is that mean that I can try to apply that microcode?
    • From other side current microcode SPH05195 for 7044-270 does not mention 7043 at all. May I apply that microcode?

    Please, help me sort out these issues/questions.

    Thank you in advance.

    Regards,
    Z
    #AIX-Forum