AIX

 View Only
  • 1.  System power status command?

    Posted Mon December 06, 2021 09:28 AM
    Back in AIX 4.3 we could query the power supply status with uesensor. More recent AIX versions have the machstat command, which provides very little information.

    While POWER and HMC reporting does a great job of alerting when there is a power problem, there appears to be no way to confirm the problem was solved at the command line. This also makes it difficult to monitor automatically with scripts.

    I asked in the HMC forum and other than some REST APIs for temperature and fan speed, I couldn't find operational health information.

    Does anyone know how to tell the power supply health and the line in status in AIX?

    It's embarrassing after a power errpt message to have to send an operator into the datacenter to take pictures on the back of the unit, especially when cheap whitebox Intel servers with web IPMI interfaces show that information and more statistics.

    ------------------------------
    Russell Adams
    ------------------------------


  • 2.  RE: System power status command?

    InnerCircle
    Posted Tue December 07, 2021 08:36 AM

    Hello Russell Adams,

    Below may help you ....

     

    Not very much known is the machstat command in AIX that can be used to display the status of the Power Status Register, and thus can be helpful to identify any issues with either Power or Cooling.

        # machstat -f
        0 0 0

    If it returns all zeroes, everything is fine. Anything else is not good. The first digit (the so-called EPOW Event) indicates the type of problem:

    EPOW Event          Description
    ---------------------------------------------
    0                normal operation
    1                non-critical cooling problem
    2                non-critical power problem
    3                severe power problem - halt system
    4                severe problems - halt immediately
    5                unhandled issue
    7                unhandled issue

    Another way to determine if the system may have a power or cooling issue, is by looking at a crontab entry in the root user's crontab:

        # crontab -l root | grep -i powerfail
        0 00,12 * * * wall%rc.powerfail:2::WARNING!!! The system is now operating with a power problem. This message will be walled every 12 hours. Remove this crontab entry after the problem is resolved.

    If a powerfail message is present in the crontab of user root, this may indicate that there is an issue to be looked into. Contact your IBM representative to check the system out. Afterwards, make sure to remove the powerfail entry from the root user's crontab.


     

    Thanks,

    Afzal Muhammad

     

    IBM Certified AIX System Administrator AIX 6.1

    IBM Certfired AIX System Administrator AIX 4.3

    IBM Certified WebSphere 6.0 Adminstrator

    Red Had Linux 7 Certified  System Administrator (RHCSA)

    Global pSeries platform services – AIX

    Ford Motor Company,  Dearborn Michigan

    United States of America

    Tel: Cell 1-704-492-0586

    Email: mafzal10@ford.com

     






  • 3.  RE: System power status command?

    Posted Wed December 08, 2021 08:00 AM
    On Tue, Dec 07, 2021 at 01:35:50PM +0000, Muhammad' Afzal via IBM Community wrote:
    > Not very much known is the machstat command in AIX that can be used to display the status of the Power Status Register, and thus can be helpful to identify any issues with either Power or Cooling.
    >
    > # machstat -f
    > 0 0 0
    >
    > If it returns all zeroes, everything is fine. Anything else is not good. The first digit (the so-called EPOW Event) indicates the type of problem:
    >
    > EPOW Event Description
    > ---------------------------------------------
    > 0 normal operation
    > 1 non-critical cooling problem
    > 2 non-critical power problem
    > 3 severe power problem - halt system
    > 4 severe problems - halt immediately
    > 5 unhandled issue
    > 7 unhandled issue
    >
    > Another way to determine if the system may have a power or cooling issue, is by looking at a crontab entry in the root user's crontab:
    >
    > # crontab -l root | grep -i powerfail
    > 0 00,12 * * * wall%rc.powerfail:2::WARNING!!! The system is now operating with a power problem. This message will be walled every 12 hours. Remove this crontab entry after the problem is resolved.
    >
    > If a powerfail message is present in the crontab of user root, this
    > may indicate that there is an issue to be looked into. Contact your
    > IBM representative to check the system out. Afterwards, make sure
    > to remove the powerfail entry from the root user's crontab.

    I'm aware, and my original post I referred to the machstat
    command. It's very poor information.

    I understand there may be a REST API now which can provide this
    information, but I'm bewildered why this basic information isn't
    provided by the HMC or in a supported AIX command.

    I can query the CPU frequency of every core, but I can't query my
    power supply status.

    ------------------------------------------------------------------
    Russell Adams Russell.Adams@AdamsSystems.nl
    Principal Consultant Adams Systems Consultancy
    http://adamssystems.nl/