PowerVM

 View Only

PowerVM Performance Monitoring - Tips and Tricks for Grafana

By Michal Wiktorek posted 14 days ago

  

Introduction

In this article, I would like to share my tips for using Grafana for monitoring IBM Power platform statistics.

I rely on using Grafana with the InfluxDB Data Source, which is powered by the nextract_plus scripts by Nigel Griffiths. However, the tips included may also be useful in other cases.

If you are interested in monitoring the performance statistics of the Power platform, I highly recommend familiarizing yourself with "Nextract Plus", as the tool offers powerful capabilities at minimal costs. More about this, as well as on installing InfluxDB, preparing the platform, and adding a Data Source to Grafana can be found on the website: https://www.ibm.com/support/pages/nextract-plus-hmc-rest-api-performance-statistics

It's also worth mentioning another great tool by Nigel Griffiths, nimon, which can also send statistics directly to the InfluxDB database, but from the AIX/Linux operating system level. If you know and like the NMON tool, you will be delighted with the ability to present data in Grafana. https://www.ibm.com/support/pages/njmon-intro-and-update

NOTE: ALL DATA SHOWN IN THE ARTICLE ARE FICTIONAL AND DO NOT COME FROM ANY REAL IT ENVIRONMENT

The intention of this article is to help you in creating your own dashboards tailored to your needs. I do not rely on pre-made dashboards available on the Grafana website.

Filtering machines and LPARs by type and location

In a large IT environment, where there are many physical servers of various types and in different data centers, using filtering with variables is very useful. I spent some time and effort figuring out a sensible way to filter LPARs and machines, so I'd like to share that with you. I think using such filtering with dashboards can greatly facilitate your daily work.

To illustrate how to set up a filter for servers and LPARs, I'll provide an example of a fictional environment. These names will be used in subsequent examples for variables in Grafana.

For example, let's assume that your Power platform consists of various servers, of different types, located in different Data Centers, divided into production and testing environments.

Fictional example of a collection of various Power servers
Fictional example of a collection of various Power servers

By default, you won't have information in the InfluxDB database about where a particular server is located, nor whether it is for production or testing purposes. Certainly, it would be very useful to be able to show server utilization, for example, only in one Data Center, or only from one environment. Without using variables in Grafana, this is not so simple, and manually setting server/LPAR names in Grafana panels can be very annoying.

The goal is to achieve a selection panel in the dashboard as below. Selecting the appropriate Data Center, environment type, and machine class automatically limits the visible dashboard charts to only the relevant machines and LPARs.

selection panel
selection panel

Example of dashboard with selected DC, Environment and Machine type
Example of dashboard with selected DC, Environment and Machine type

To achieve this effect, you must go into Settings in your dashboard, select the Variables option, and create new variables. Note that some will be "custom" variables, while others will be "Query" types, which will query the InfluxDB database.

Dependencies of variables
Dependencies of variables

I personally recommend creating a separate dashboard for machine statistics and another dashboard for LPAR statistics. There are so many data points possible to collect that presenting everything in one place might require scrolling through the dashboard and it could lose its simplicity. In the case of a dashboard with machine statistics only, the LPAR variable should, of course, be omitted.

WARNING: all texts contained in the "CUSTOM OPTIONS" and "QUERY" fields must be written as a continuous string. There should be no line break characters anywhere. Unfortunately, I had to wrap the text in the article, so this may be misleading.

"DATACENTER" custom variable

You can divide servers by location using the Custom type variable. Pay attention to the format of the entries. Of course, you need to replace any names with those corresponding to your real machines.

Correct syntax: GROUP1 : item1|item2|item3,GROUP2 : item4|item5|item6

Be careful with space characters and do not use line break characters. Everything must be in a single line.

DC1 : Server1_S1022|Server2_S1022|Server3_S1022|Server10_E1050|Server11_E1050|Server20_E1080|Server21_E1080, DC2 : Server4_S1022|Server5_S1022|Server6_S1022|Server12_E1050|Server13_E1050|Server22_E1080|Server23_E1080, DC3 : Server7_S1022|Server8_S1022|Server9_S1022|Server14_E1050|Server15_E1050|Server24_E1080|Server25_E1080

Example in Grafana:

DC variable example

Remember to select the Multi-value option. Notice the value preview at the bottom of the page.

"ENV" custom variable

The rules are the same as for "DATACENTER"

TEST : Server1_S1022|Server2_S1022|Server4_S1022|Server5_S1022|Server7_S1022|Server8_S1022|Server10_E1050|Server12_E1050|Server14_E1050,PROD : Server3_S1022|Server6_S1022|Server9_S1022|Server11_E1050|Server13_E1050|Server15_E1050|Server20_E1080|Server22_E1080|Server24_E1080|Server21_E1080|Server23_E1080|Server25_E1080

Example in Grafana:

ENV variable

"TYPE" custom variable

The rules are the same as for "DATACENTER"

Scale-out : Server1_S1022|Server4_S1022|Server7_S1022|Server2_S1022|Server5_S1022|Server8_S1022|Server3_S1022|Server6_S1022|Server9_S1022,Midrange : Server10_E1050|Server12_E1050|Server14_E1050|Server11_E1050|Server13_E1050|Server15_E1050,Enterprise : Server20_E1080|Server22_E1080|Server24_E1080|Server21_E1080|Server23_E1080|Server25_E1080

Example in Grafana:

TYPE variable

"SERVERNAME" query

In this case, the variable is of the QUERY type and uses information from the InfluxDB database. Pay attention to the conditions used in the query, such as "DATACENTER:pipe", "TYPE:pipe", and "ENV:pipe". Using such a query allows you to limit the data only to those selected in the filtering panel.

SHOW TAG VALUES WITH KEY = "servername" WHERE servername =~/^*(${DATACENTER:pipe})$*/ AND servername =~/^*(${TYPE:pipe})$*/ AND servername =~/^*(${ENV:pipe})$*/

Example in Grafana:

SERVERNAME query


"LPAR" query

This query displays the names of LPARs, but only those that match a given server. The query displays only the latest names to avoid duplicating names (e.g., after migrating an LPAR between machines) and is limited only to LPARs from the last 30 days to avoid displaying outdated information of old LPARs. Of course, you can choose to omit this last condition.

SELECT last("name") FROM "lpar_details" WHERE ("servername" =~ /^$SERVERNAME$/) AND time > now() - 30d GROUP BY "lparname"

Example in Grafana:

LPAR query

All variables

The entire set of variables should look as follows. A "WARNING" sign might appear if some of the variables are not used in the dashboard or by another variable.

All variables

Queries in panels

In Grafana panels, you should use the variable name instead of hard-coding the LPAR or machine names. Do it as in the example below:

SELECT "currentVirtualProcessors" FROM "lpar_processor" WHERE ("lparname" =~ /^$LPAR$/) AND $timeFilter GROUP BY "lparname"
SELECT "availableProcUnits" FROM "server_processor" WHERE ("servername" =~ /^$SERVERNAME$/) AND $timeFilter GROUP BY "servername"

Thats all regarding servers filtering :) A certain complication is that variables have to be entered for each dashboard separately, so it's worth cloning dashboards instead of creating them from scratch. Remember to update information when, for example, a new physical server appears in your environment or is moved between locations.

Some changes may not be visible until you save the dashboard. It's also common for the browser to remember old names in filters - try clearing the browser cache (CTRL + F5) in that case.

Setting up the legend in a Grafana panel - Time series graph

In the case of the Time Series Graph, it's beneficial to utilize the ability to sort by legend values. This way, you can easily check, for example, which LPARs utilize the CPU the most or have the highest peaks. Change the legend to a table form and select the values that interest you.

Legend

Example of legends sorted by Max value:

Max value

Stacked or unstacked?

A very useful feature in Grafana is the ability to create STACK Groups, which are independent of the general STACK settings of the panel. This allows for the summing of selected data while maintaining other values constant.

For example, you may want to sum up on the graph the utilization data of multiple LPARs across multiple machines and at the same time, you would like the chart to make a separate stack for TOTAL CPU of physical machines and another for the utilization values of LPARs, but without combining them together (I hope I didn't confuse you too much :) ).

To achieve this, simply add in the "OVERRIDE" option for the selected query a STACK and name the group with an appropriate name, e.g., TOTAL for queries about the data of all machine CPUs, and UTIL for queries about utilization data.

STACK

Change InfluxQL query editor to Text editor mode (RAW)

The editor is certainly a convenience, but I think when working with Grafana and InfluxDB, you'll find it beneficial to learn and perform more complex tasks using "RAW MODE". To switch between modes, click the pencil icon next to the query.

Query editor
Query editor
Text editor mode (RAW
Text editor mode (RAW

Use tags

Notice that in the QUERY, in the "Alias by" field, you can use tags, for example:

  • tag_servername
  • tag_lparname
  • tag_viosname

You can use them if you previously used appropriate names in the query as GROUP BY. The corresponding names will be displayed in the legend, which is extremely useful.

Group by

Summary

I hope you found this text useful. If the topic is of interest to you and you would like more advice, or if you are interested in examples of dashboards for monitoring the Power platform, feel free to contact with to me :)

1 comment
18 views

Permalink

Comments

6 days ago

Thanks Michal for a clear article.
You add significate extra options with hands-on examples.

Glad you liked the nextract_plus tool enough to add value.

Cheers, Nigel @mr_nmon Griffiths