HMC & CMC

PowerVM Performance Monitoring – What's New In HMC R8 V8.6.0

By Dyutiman Chaudhuri posted Wed June 10, 2020 08:37 AM

  

With the HMC R8 V8.6.0 release of the Hardware Management Console (HMC), comes some of the most sought after features under PowerVM performance monitoring. They include:

  1. Export of the PowerVM performance metrics in CSV format
  2. Energy monitoring support
  3. Shared Storage Pool monitoring support

This blog discusses each of the above three features in detail. The details cover the problem background, the current solutions that exist, what the 8.6.0 release brings with it and how the features can be used by customers.

PowerVM Metrics – Ready For Custom Data Analytics

Background

The PowerVM Performance and Capacity Monitoring was first introduced in the Hardware Management Console (HMC) R8 V8.1.0 release. With Performance Monitoring, it is possible to get the compute, shared network and shared storage metrics of the PowerVM environment. The metrics are collected from the underlying platforms and aggregated on the management console. They were made available in the industry standard JavaScript Object Notation (JSON) format. The R8 V8.1.0 release of HMC also introduced visualization of these metrics on the HMC Graphical User Interface. 

At the Power Technical Collaboration Council (TCC) meet earlier this year, I met with a customer who was trying to visualize the PowerVM metrics using the Splunk monitoring and analytics software. We discussed various issues that he ran into trying to convert the JSON formatted performance metrics data to a format that can be understood and interpreted by Splunk. It was an interesting discussion which also got us thinking why should visualization of the PowerVM performance metrics by Splunk or any third-party tool be so difficult to achieve.

PowerVM Metrics Visualization Today

The HMC today allows for visualization of the metrics against several categories like the processor, memory, network and storage utilization trends. It also provides a snapshot view that gives a minute summary of the compute, shared network and shared storage utilization at the overall server level.

Data Visualization Gaps

While the PowerVM performance metrics are available for consumption in the JSON format over the management console APIs and the graphical user interface, there do exist some gaps. Following are some that have frequently surfaced in customer conversations on performance monitoring:

  • Export data for long term warehousing of information for duration more than what HMC supports today
  • Capability to view the data at any point in time in the future
  • Ease of visualization using third party data analytics tools 
  • Use exported data as-is without having to adapt the input-output format
  • Ability to run customized query on the data and not be restricted by static views

Export To CSV Format

In order to address the above pain points, the HMC R8 V8.6.0 release introduces the CSV export of the PowerVM metrics feature. Through this feature, the PowerVM performance metrics which were earlier available in the JSON format are now made available in the CSV data format. The processed and aggregated metrics for the Server, Partition and the Virtual IO Server which are stored in the management console database can now be exported in the CSV format. The CSV formatted metrics can be fetched from the management console over similar REST APIs that are used today to fetch the metrics in JSON format.

The data in the CSV format is a normalized representation of the data that exists in the JSON structure. A snapshot of the data in excel is shown below. Please refer to the attachment PCMMetrics_SYS.csv for a sample of an exported CSV file. 

Visualization By Third-Party Tools

The PowerVM performance metrics CSV file was imported into two data analytics tools:

  1. Watson Data Analytics
  2. Splunk

Watson Data Analytics

I registered for the Watson Data Analytics single user 30 day trial license to test out the CSV metrics exported from the HMC. Though this blog is not meant to be a review of the tool, I want to mention that Watson Data Analytics tool was pretty intuitive and easy to use. I could get started with it in no time and was able to get meaningful visualizations in less than 10 minutes' time.

 

As soon as one launches the trial application, the user is led to the home page that resembles the snapshot below. Click on the "New data" and one is presented with the option to upload a CSV formatted file from local disk among other options.

 

Once the CSV file has been uploaded and processed by the tool, it is possible to click the uploaded CSV file to go to the analytics page. Here, the tool automatically populates some starting points based on its analysis of the contents of the file.

Clicking on What is the contribution of FiberChannelAdapter.numOfWrites...took me to the following visualization: 

 

From here, it is quite easy to choose what goes on the x-axis and what goes on the y-axis. 

 

 

Here, I changed from the HEPhysicalPort.physicalLocation to FiberChannelAdapters.physicalLocation and Voila!

 

 

It is possible to filter for only those FiberChannelAdapters that are of interest. Should one wish to change the visualization, there are number of recommended visualizations that are possible on the data:

 

 

It is possible to save the analysis and the visualizations to the dashboard and also conveniently share it as a link, email, download or simply tweet about it. This information should help anyone wanting to get started using Watson data analytics tool.

Splunk:

Following are the basic Splunk education links that can help one get started using the tool. You can try the Splunk Enterprise trial version to start with.

Installing Splunk Enterprise
Getting Data In
Splunk Basic Search
Creating a Dashboard

 

Here is a snapshot that shows how Splunk could read and plot graphs using the imported CSV formatted data without requiring a creation or any changes to the data model:

 


 

Note: The CSV export of the Energy and SSP monitoring metrics will be available in the first Service Pack of the HMC R8 V8.6.0 release.

PowerVM Metrics – Energy Monitoring

Background

Power Systems Energy Monitoring was mostly achieved in the past through the IBM Systems Director’s Active Energy Manager (AEM) plugin. With AEM not being enhanced or available anymore, you have to fetch the energy monitoring metrics directly from the frame through the WBEM CLI CIM protocol. This is not a straight forward way to collecting, warehousing or visualizing the energy monitoring information.

Energy Monitoring By HMC

In the HMC R8 V8.6.0 release, the energy monitoring feature has been included as part of the management console functions. The below diagram depicts how the energy metrics can now be fetched through the HMC’s REST API interface once Energy Monitoring has been enabled for the servers via the HMC. It is the HMC’s responsibility to collect the Energy metrics from the system’s Flexible Service Processor (FSP) and aggregate the information in its database. The users can collect the raw or the aggregated metrics from the HMC through the respective REST APIs.

 

 

Energy Metrics

There are two categories of Energy Metrics that are collected by the HMC:

  1. Power Metrics
  2. Thermal Metrics

The following Power and Thermal Metrics are made available for a server.

Power Energy Reading:

  1. Current, minimum, maximum and average Power over a collection period
  2. Timestamp
  3. Reporting Period
  4. Power Reading State

Thermal Energy Reading:

  1. Inlet temperature – entity ID, entity instance and temperature data
  2. CPU temperature – entity ID, entity instance and temperature data
  3. Baseboard temperature – entity ID, entity instance and temperature data

Energy Monitoring Use Cases

Following are some of the use cases where knowledge of the Power and Thermal metrics of the servers can be applied.

Power – Monitoring Watts Use Cases:

  1. Which machines are expensive to run, electricity being a large unexpected expenditure to most customers
  2. How has moving to POWER8 helped…like POWER5 to POWER8 = the electricity saved could fund the new box
  3. Chargeable dollars per K-Watt-hour number. The cost in actual local currency per day.

Thermal – Monitoring In and Out Temperatures:

  1. Over worked machines – to avoid placing further workloads on
  2. Under worked machines – good targets for more workloads
  3. Machines with heating problems – maybe the vents are covered
  4. Machines unexpectedly hot or over clocking – perhaps an LPAR is locked in a loop and needs investigation
  5. Determine if the room is over heating or the server is at a hot end of the room               

PowerVM Metrics – Shared Storage Pool Monitoring

Background

Shared Storage Pool (SSP) is a great way to achieve integrated storage virtualization. With that arises the complexity of effectively managing and troubleshooting the SSP entities like the node, tier, failure group and disk associated with a single SSP or a cluster. The monitoring of such a Shared Storage Pool thus becomes critical to an administrator for effective debugging of issues. The HMC R8 V8.6.0 release introduces Shared Storage Pool monitoring.

How Does It Work

Shared Storage Pool monitoring by the HMC works as shown in the figure. You can enable SSP monitoring through one of the Virtual IO Server (VIOS) nodes, managed by the HMC, which is part of the cluster. The collection of performance metrics happens through the specific Virtual IO Server node. The node is responsible for aggregating the metrics from other Virtual IO Server nodes that are part of the cluster but may or may not be managed by this HMC. Thus, the metrics collected through a Virtual IO Server node by a HMC is representative of the traffic on the entire cluster and not just what is contributed by the collecting Virtual IO Server node. In case the VIOS node goes down for some reason or is not reachable, the HMC tries to fail-over to the next VIOS node, of the same cluster, that it is managing. In case one is not available or is not in a healthy state, the HMC periodically polls the failed node to resume collection once the node is up.

 

 

In summary, the HMC enables SSP monitoring through a Virtual IO Server node (Step 1), the Virtual IO Server node aggregates metrics from all nodes that are active and are part of the cluster (Step 2) and the metrics are collected by the HMC (Step 3). The SSP REST APIs allow the users to enable and collect the SSP monitoring metrics. Both raw and aggregated metrics are available through the REST APIs.

Shared Storage Pool Metrics

SSP monitoring enables monitoring at different SSP entity levels. The following metrics are collected at the SSP, node, failure group, tier and disk levels:

  1. ID
  2. Name
  3. Size
  4. Free
  5. Number of Read/Writes
  6. Read/Written Bytes
  7. Read/Write Service Time (min, max, average)
  8. Number of Read/Write Request Timeouts
  9. Number of Read/Write Request Failures
  10. Number of Times Service Queue is Full

Some Useful Links

Performance Monitoring Concepts And REST APIs 
Performance Monitoring Metrics Specification

Export To CSV Format Using HMC GUI 
Performance Monitoring - Firmware And VIOS Compatibility 

Performance guru Nigel Griffiths, aka Mr. nmon, has written a very detailed example on using the REST API that's an excellent starting point.

Contacting the PowerVM Team

Have questions for the PowerVM team or want to learn more?  Follow our discussion group on LinkedIn IBM PowerVM

0 comments
11 views

Permalink