Power 9 (9009-41A)
AIX 73.5.7
FSP VL950_136
No HMC
We tried swapping some hardware initially with no success. Further research by IBM found something in the FSP/CIM software and they suggested we reset the FSP and then disable/enable the CIM. While this corrected the current set of missing metrics, if the FSP restarts we have to check all the systems for missing metrics and reset the FSP/CIM. Most cases it only takes reset of the CIM but a few have also needed the FSP reset. So far a the resets have resolved issues.
With the CIM, the only way to get the metrics I'm aware of is to request the 2 to 3 hour metrics log from the FSP and parse it looking for the last entry for each metric. It's a lot of data to get 15 metrics. Not that big a problem but it does take a lot longer than simply returning the current value for each metric.
Did some reading on the redfish (REST) interface. The documentation says eBMC and firmware 10xxwhich implies Power 10 ,so maybe redfish does not apply to Power 9/FSP.
Have tried some Redfish queries on my P9s using my P9 FSP admin id and password all return 400 - Bad request errors.
Original Message:
Sent: 1/8/2024 7:38:00 AM
From: nigel griffiths
Subject: RE: Using CIM to access the Flexible Service Processor
Hi,
If you have faulty Power hardware or even suspect hardware, then I strongly suggest you contact IBM Power hardware support to get that fixed as-soon-as-possible.
I don't know anyone that uses the CIM interface to extract temperature or electrical power use.
This is because most Power customers have HMC's that are network connect to the service processors and that means they are on isolated private networks for security reasons. The HMC has a different and very complex REST API for that data and performance statistics with a separate bunch of issues. In case, you are interested Google
If that is of interest find "nextract plus" link here AIXpert Blog from Nigel Griffiths (@mr_nmon) (ibm.com) This is also Youtube videos on the setup.
Sorry, that this does not help much in you case.
Cheers @mr_nmon
------------------------------
Nigel Griffiths
------------------------------
Original Message:
Sent: Fri January 05, 2024 12:54 PM
From: Michal Kozlowski
Subject: Using CIM to access the Flexible Service Processor
Hi Jim,
I've tried to use CIM for checking ambient temperature, but in my case on some boxes CIM is not posible to enabled, on other boxes depends of firmware level I receved less data via CIM (the newer the firmware, the less data I received).
Hoever, yesterday when I read history for VL9xx Levels I found a potentially different method to collect the data I am interested in: Redfish (REST) API.
I'm not familiar wiht redfish, so I put it on my list: 'To read'.
Details:
Power9 System Firmware Fix History - Release levels VL9xx
https://www.ibm.com/support/pages/node/6955591
(Part of details for VL950_119_045 / FW950.70)
Support for using a Redfish (REST) API to gather power usage for all nodes in watts and the ambient temperature for the system.
Redfish sample response is as shown below:
==>> GET redfish/v1/Systems/<>
...
"Oem": {
"IBMEnterpriseComputerSystem": {
...
...
"PowerInputWatts" : <> ( number in watts), <<<<============
"AmbientTemp" : <> (number in Celsius) <<<<============
}
},
...
(Redfish documents):
https://www.dmtf.org/standards/redfish
Regards,
Michal
------------------------------
Michal Kozlowski
Original Message:
Sent: Tue July 25, 2023 11:59 AM
From: Jim Rinn
Subject: Using CIM to access the Flexible Service Processor
Have recently found documentation for the FSP CIM interface. The CIM allows access to metrics such as inlet/outlet air temperatures, fan speeds, power usage. The metrics can be retrieved using the wbemcli opensource command compiled on AIX.
The question is why are some of the servers not returning the full list of metrics? All these servers are identically configured hardware small Power 9 Model 9009-41A / S914. Servers are all stand-alone and there is not HMC involved. An IP connection is available to the HMC port on all FSPs. All servers were purchased at same time. All servers and FSPs running AIX updated to the same level and the FSPs have the same firmware version. So one would expect each FSP would report the same list of metrics.
We recently had 1 server with an issue that was also causing a power supply fault and was missing metric data for the power supplies. We also noticed missing hardware elements such as a power supply form the CIM data. The missing hardware was likely the reason for the missing metrics. This was corrected by replacing a cable in the chassis.
Queries of other FSPs revealed a few servers were returning some but not all of the metrics.
The inlet air temp has a location code of D1 which would indicate the sensor is in the control panel.
The missing power metric is always at location E3 (systems have 2 power supplies E3 and e4).
Has anyone had any experience with using CIM data and what the missing data might mean?
This is likely a hardware issue but it seems odd that 30% of these system have a combination of bad control panels, power supplies or cables
.
------------------------------
Jim Rinn
------------------------------