PowerVM

Power Virtualization

Learn about the virtualization technologies designed specifically for IBM Power including #PowerVM, #PowerVC, #VM Recovery Manager#HCM/CMC, and more.


#Power
#TechXchangeConferenceLab

 View Only
Expand all | Collapse all

Performance metrics from your HMC to Prometheus

  • 1.  Performance metrics from your HMC to Prometheus

    Posted Tue November 05, 2024 04:00 AM

    Hi,

    For those of you who are using Prometheus.

    I have updated my opensource tool (HMC performance metrics exporter) to allow Prometheus to scrape metrics data.

    More info: https://github.com/mnellemann/hmci

    Feedback always welcome :)



    ------------------------------
    Best regards,
    Mark Nellemann
    Advisory Power Technical Specialist
    IBM Denmark
    ------------------------------

    #HMCandCMC


  • 2.  RE: Performance metrics from your HMC to Prometheus

    Posted Tue March 04, 2025 07:31 AM

    Hello Mark,

    thanks a lot. I just found your tool. It is great and fulfill my requirements perfectly to write these data to Prometheus.

    I installed the rpm now on SLES 15, because I couldn't download the container image. It seems to be user / password protected.
    Just one question: Why are there influxdb datasources in the Prometheus dashboards for Grafana?

    Many thanks again, great work.
    Is there a list with datapoints like for nextractplus? That would help building my own dashboards.

    kind regards



    ------------------------------
    Joerg Kauke
    Unix Administrator
    COOP Switzerland
    ------------------------------



  • 3.  RE: Performance metrics from your HMC to Prometheus

    Posted Tue March 04, 2025 08:18 AM

    Hi Joerg,

    Glad to hear it's useful for you :)

    The container image was created by a collegue and I'm not sure why it requires authentication. I'll ask and see if we can fix it.

    It's a mistake with the InfluxDB datasource in the in Prometheus dashboards. I haven't noticed that before now, but probably due to the fact that i created the InfluxDB dashboards long time ago and modified (a copy) for Prometheus. I will cleanup for next release.

    I don't have a list of datapoints, but the endpoint for Prometheus ( http://host-with-hmci:9040 ) will give you a list/overview of available metrics.



    ------------------------------
    Best regards,
    Mark Nellemann
    Advisory Power Technical Specialist
    IBM
    ------------------------------



  • 4.  RE: Performance metrics from your HMC to Prometheus

    Posted Wed March 05, 2025 01:39 AM

    Hi Mark,

    thanks for answering that quick.
    After checking the datapoints, I have one further question. Does hmci collect all available data? It seems there are some data missing, or I couldn't find them until now, e.g. the LPAR partition ID.

    have a great day.

    Kind regards



    ------------------------------
    Joerg Kauke
    Unix Administrator
    COOP Switzerland
    ------------------------------



  • 5.  RE: Performance metrics from your HMC to Prometheus

    Posted Wed March 05, 2025 03:11 AM

    Hi Joerg,

    There are some data that can't be represented for Prometheus (at least with the Prometheus SDK I'm using), which are available with InfluxDB. This is ex. LPAR name, OS and version. Basically anything not a number.

    Best regards,

    Mark



    ------------------------------
    Best regards,
    Mark Nellemann
    Advisory Power Technical Specialist
    IBM
    ------------------------------



  • 6.  RE: Performance metrics from your HMC to Prometheus

    Posted Wed March 05, 2025 05:00 AM

    Hi Mark,

    thanks for the explanation. That's a bit unsightly. I have to check if we would really need some of the missing data.
    But the partition ID is a number, isn't it? Or does it show up as string?



    ------------------------------
    Joerg Kauke
    Unix Administrator
    COOP Switzerland
    ------------------------------



  • 7.  RE: Performance metrics from your HMC to Prometheus

    Posted Thu March 06, 2025 03:11 AM

    Hi Joerg,

    The name of the LPAR is provided eg. ```

    partition_processor_entitled_units{partition="Alma-Image-100g-0171cf4e-00000011",system="Server-9009-42A-SN21F64EV"} 0.5

    If you are referring to the numeric ID of the Partition it could be included.

    https://github.com/mnellemann/hmci/blob/main/src/test/resources/pcm-data-logical-partition.json#L41

    But this ID will change when/if you LPM your partition, so I'm not sure it's useful ?



    ------------------------------
    Best regards,
    Mark Nellemann
    Advisory Power Technical Specialist
    IBM
    ------------------------------



  • 8.  RE: Performance metrics from your HMC to Prometheus

    Posted Mon March 16, 2026 07:50 AM
    Edited by Joerg Kauke Mon March 16, 2026 07:55 AM

    Hello Mark,

    exactly one year later, I have to come back to with a further question.
    Lately we integrated Nigel's nextract plus in our envirement, just to see what data it delivers.
    I've created a Grafana dashboard and now I'm wondering.
    The values for example for vio_network_virtual_received_bytes:

    Same Managed System, same VIOS, same VLAN.
    Do you have any idea where my mistake is?

    Many thanks in advance for your support.

    Kind regards



    ------------------------------
    Joerg Kauke
    Unix Administrator
    COOP Switzerland
    ------------------------------



  • 9.  RE: Performance metrics from your HMC to Prometheus

    Posted Tue March 17, 2026 05:15 AM

    Moin Mark,
    Moin Joerg.

    We are using Nigels nextract as well and it works pretty good. (Thanks Nigel)

    But it would be nice to have an official supported way to get the data.
    Therefore I have opened an IBM IDEA in March 2024 to get an official prometheus exporter.
    Actually it is in "planned for future release" state. 

    Maybe it could help to vote for it to accelerate the development.

    https://ideas.ibm.com/ideas/HMC-I-456

    Kind regards



    ------------------------------
    Sascha Wycisk
    ------------------------------



  • 10.  RE: Performance metrics from your HMC to Prometheus

    Posted Fri March 20, 2026 08:50 AM

    Hi a few thoughts and places to check.

    I would first check the Grafana Units used for statistics, because Bytes are a "pain in the neck" with too many digits.

    Meaning the IBMi value might be in MB or KB.   Even so nextract 10,700,000 / 1024 = 10,449 and not HMCi 168,000

    Are the HMCi bytes actually packets? 10700000 / 168000 =  64 bytes per packet - Hmm! Plausible!

    The documentation for the HMC statisitcs are here:

        https://www.ibm.com/docs/en/power9/9009-22A?topic=specification-managed-system-processed-aggregated-metrics-json

    nextract plus is sending the statistics "as found" in the data structure returned from the HMC i.e. Python uses the labels and values found.

    The HCMi renames them vios_network_virtual_received_bytes - this could be a coding mistake. I know I do this often while rapidly hacking code.

    Note: nextract refers to this statistic as vios_network_virtual_receivedBytes.  So, the "receivedBytes" come from the original data structure from the HMC.

    In fact, receivedBytes does not appear in the nextract code at all.

    Perhaps HMCi is confused receivedBytes with receivedPackets, receivedPhysicalBytes or receivedPhysicalPackets.

    That is a wild guess.

    It is complicated but the bit  for VIO networks is found under

    "viosUtil": [{

    . . .

    "network": {

    . . . 

     "virtualEthernetAdapters": [{

    physicalLocation": "string", "vlanId": "number", "vswitchId": "number",

    "isPortVLANID": "boolean",

    "receivedPackets": ["number", "number", "number"],

    "sentPackets": ["number", "number", "number"],

    "droppedPackets": ["number", "number", "number"],

    "sentBytes": ["number", "number", "number"],

    "receivedBytes": ["number", "number", "number"],

    "receivedPhysicalPackets": ["number", "number", "number"],

    "sentPhysicalPackets": ["number", "number", "number"],

    "droppedPhysicalPackets": ["number", "number", "number"],

    "sentPhysicalBytes": ["number", "number", "number"],

    "receivedPhysicalBytes": ["number", "number", "number"],

    "transferredBytes": ["number", "number", "number"],

    "transferredPhysicalBytes": ["number", "number", "number"] }],

    I hope this helps a bit. Willing to accept the blame if it's my code that is wrong.



    ------------------------------
    Nigel Griffiths - IBM retired
    London, UK
    @mr_nmon
    ------------------------------



  • 11.  RE: Performance metrics from your HMC to Prometheus

    Posted Mon March 23, 2026 03:33 AM
    Edited by Joerg Kauke Mon March 23, 2026 04:03 AM

    Hello Nigel,

    thanks for sharing your deep knowledge about these metrics. It clarifies some of my confusion.
    May I ask you, which API Endpoint you are using nextract plus? Is it Long Term oder PCM?

    My intention was, to use the HMC data in Prometheus and trigger some alerts if thresholds are reached.
    'njmon' would be the better solution, but the VIOS has a restricted network and it's difficult to get the the data out of the VIO Servers.

    I will check a little further to find a way.
    Many Thanks to you.

    Kind regards.



    ------------------------------
    Joerg Kauke
    Unix Administrator
    COOP Switzerland
    ------------------------------



  • 12.  RE: Performance metrics from your HMC to Prometheus

    Posted Mon March 23, 2026 02:12 PM
    Hi,
    The HMC is an appliance device.
    It can perform many tasks, both in the background and for HMC users, making changes for the sysadmin team to the 64 possible Power servers it controls.
    Plus tasks for PowerVC users remotely.

    Gathering stats from the hardware (hypervisor) and VIOSs for monitoring is not exactly a high priority tasks.
    VIOS performance can be critical to the best performance for the VMs it serves with I/O.
    Real-time stats aren't a priority compared to I/O for networks and storage.

    The HMC compromise is that performance stats are collected at 5-minute intervals to keep CPU cycles within acceptable limits. I imagine the design teams are cautious in their approach.

    I am not a spokesperson for the IBM HMC or VIOS teams, and the design and this information might be out of date.

    If you want faster statistics, you can monitor the VIOS directly, but you'll need to ensure you don't create performance problems on the VIOS.

    I hope this background information helps understanding, Nigel