Tape Storage

Monitoring IBM Spectrum Archive™ Enterprise Edition with Icinga

By Nils Haustein posted Sat February 29, 2020 10:39 AM

  

I have created a program that checks various IBM Spectrum Archive EE components, analyses the state and prints the result. This program can be integrated as remote plugin with an Icinga monitoring server. In this blog article I explain the program and show the integration with Icinga version 1 and Nagios Remote Plugin Executor (NRPE).

The IBM Spectrum Archive EE components being checked and analyzed are:

  • status of software
  • nodes
  • tape drives
  • tapes
  • pools
  • task (running and completed)

 

The program is published on github.com at: https://github.com/nhaustein/check_spectrumarchive

The program is based on IBM Spectrum Archive EE version 1.3 and uses on the eeadm command to obtain the status of the components in JSON format. To analyse the JSON output I use the tool jq (https://stedolan.github.io/jq/). This tool must be installed on all Spectrum Archive nodes that run the program. Using the JSON output as source for the analysis is much more efficient because of the key – value pairs. It prevents to search the output for tokens that may change over time.

 

Syntax

This program can be invoked with one parameter at a time and performs the appropriate checks:

Syntax:

./check_spectrumarchive.sh [ -s | -n | -t | -d | -p<util> | -a<r|c> -e -h]

Options:
  -s             --> Verify IBM Spectrum Archive status
  -n             --> Verify node status
  -t             --> Verify tape states
  -d             --> Verify drive states
  -p <util>      --> Check tape pool utilization is above %util
  -a<r|c>        --> Check for running or completed tasks that failed
 -e             --> Check all components above 
 
-h             --> Print This Help Screen


The program returns OK, WARNING or ERROR along with the appropriate return 0, 1 or 2 respectively. Upon WARNING and ERROR messages the program prints the components and status.

Only one option can be specified at a time. The combination of multiple options in one call of the program does not work.

The program can be used standalone or it can be integrated with an external Icinga or Nagios monitoring server.

 

Installation and prerequisites

Clone the github repo and transfer the check_spectrumarchive.sh program to all Spectrum Archive EE nodes that need to be monitored. Make the program executable and place it in a directory referenced by the PATH environment variable (e.g. /usr/local/bin).

Download the jq tool and install it on the Spectrum Archive nodes as well. Enter the installation path of the jq tool into the check_spectrumarchive.sh program (parameter: $JQ_TOOL). The default location is /usr/local/bin.

Once this has been done execute the check_spectrumarchive.sh program on the Spectrum Archive EE nodes and make sure it works.

 

Integration with Icinga

Icinga allows to monitor infrastructure and services. The Icinga architecture is client and server based.

The server is the Icinga server providing the graphical user interface and the option to configure monitored objects such as host groups, hosts and services. The hosts to be monitored are the Spectrum Archive nodes. The services are checked with the check_spectrumarchive.sh program. The Icinga server essentially calls the program on the remote Spectrum Archive nodes using NRPE. More information about Icinga: https://Icinga.com/products/

The client is the IBM Spectrum Archive nodes being monitored. The communication between the server and the client can be based on Nagios Remote Plugin Executor (NRPE). This requires to install and configure NRPE on the Spectrum Archive nodes. More information about NRPE: https://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details

 

Prepare the client

In order to monitor the Spectrum Archive nodes using NRPE the NRPE packages and optionally the Nagios-plugins have to be installed and configured. These packages need to be installed on all Spectrum Archive to be monitored. There are different ways to install NRPE and Nagios plugins. Red Hat does not include these packages in the standard installation repository, but they can be downloaded from other sources (e.g. rpmfind). The following packages should be installed:

nrpe, nagios-common, nagios-plugin

An alternative way for installing NRPE and nagios-plugins can be found here: https://support.nagios.com/kb/article.php?id=8

After the installation of NRPE has finished, notice some important path and configuration files:

  • NRPE configuration file (NRPE.cfg), default location is /etc/nagios/NRPE.cfg
  • nagios plugins (check_*), default location is /usr/lib64/nagios/plugins

 

Edit the NRPE configuration file (e.g. /etc/nagios/NRPE.cfg) and set the include directory:

include_dir=/etc/NRPE.d/

 

The check_spectrumarchive.sh program must be run with root privileges. NRPE however does not run as root but as a user that is defined in the NRPE.cfg file (NRPE_user, NRPE_group). The default user and group name is NRPE. Consequently sudo must be configured on the server to allow the NRPE-user to run the check_spectrumarchive.sh tool. To configure sudo, perform these steps:

  1. In the NRPE-configuration file (/etc/nagios/NRPE.cfg) set command prefix to sudo:
command_prefix=/usr/bin/sudo

 

  1. Add the NRPE-user to the sudoer configuration:
%NRPE          ALL=(ALL) NOPASSWD: /usr/local/bin/check_spectrumarchive.sh*, /usr/lib64/nagios/plugins/*

 

Now copy the executable program check_spectrumarchive.sh to /usr/local/bin

Switch to the NRPE-user and test if the program works under the sudo context:

# /usr/bin/sudo check_spectrumarchive.sh -s

Note, if you are not able to switch to the NRPE-user you may have to specify a login shell for the user (temporarily).

 

Create the NRPE-configuration for the Spectrum Archive specific checks using this program as a remote plugin. Note, the allowed_hosts must include the IP address of your Icinga server. Each check has a name given in [] which executes a particular command, such as /usr/local/bin/check_spectrumarchive.sh -s. Find an example below:

allowed_hosts=127.0.0.1,9.155.114.101
command[check_users]=/usr/local/nagios/libexec/check_users -w 2 -c 5
command[check_ee_state]=/usr/local/bin/check_spectrumarchive.sh -s
command[check_ee_nodes]=/usr/local/bin/check_spectrumarchive.sh -n
command[check_ee_tapes]=/usr/local/bin/check_spectrumarchive.sh -t
command[check_ee_drives]=/usr/local/bin/check_spectrumarchive.sh -d
command[check_ee_pools]=/usr/local/bin/check_spectrumarchive.sh -p 80
command[check_ee_rtasks]=/usr/local/bin/check_spectrumarchive.sh -a r
command[check_ee_ctasks]=/usr/local/bin/check_spectrumarchive.sh -a c

 

Find an example of the NRPE configuration in the github repo

Now start and enable the NRPE service and check the status:

Continue with the configuration of the monitored objects on the Icinga server.

 

Configure Icinga server

I used a docker container running Icinga. The default configuration of the Icinga server is located in /etc/Icinga. The default location for the object definition is in /etc/Icinga/objects.

First check that the Icinga server can communicate with the Spectrum Archive nodes using NRPE. For this purpose the check_NRPE plugin of the server can be used. The default location is:  /usr/lib/nagios/plugins/check_NRPE. Find an example below:

# /usr/lib/nagios/plugins/check_NRPE -H <IP of Spectrum Archive node>

 

This command should return the NRPE version. If this is not the case investigate the problem. Likewise you can execute a remote check:

# /usr/lib/nagios/plugins/check_NRPE -H <IP of Spectrum Archive node>
-c check_ee_state

This command should also return an appropriate response

 

If the NRPE communication and remote commands work then allow external commands by opening the Icinga configuration file (/etc/icinga/icinga.cfg) and adjust this setting:

check_external_commands=1

 

Now configure the objects for the Spectrum Archive nodes. It is recommended to create a new file in directory /etc/Icinga/objects. In the example below two Spectrum Archive host (eenode1 and eenode2) are assigned to a host group (eenodes). For this host group a number of services are defined that within the define service stanza. Each service has a name, a host group where it is executed and a check command. The check command specifies a NRPE check and the name of the check that was configured in the NRPE-configuration of the client. For example the check_command check_NRPE!check_ee_state will execute the command /usr/local/bin/check_spectrumarchive.sh -s on the hosts.

define hostgroup {
      hostgroup_name  eenodes
      alias           EE Nodes
      members         eenode1,eenode2
      }

define host {
      use                     generic-host
      host_name               eenode1
      alias                   EE Node 1
      address                 <ip of ee node 1>
      }

define host {
      use                     generic-host
      host_name               eenode2
      alias                   EE Node 2
      address                 <ip of ee node 2>
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Users logged on to the system
      check_command           check_NRPE!check_users
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE software state
      check_command           check_NRPE!check_ee_state
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE node state
      check_command           check_NRPE!check_ee_nodes
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE drive states
      check_command           check_NRPE!check_ee_drives
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE tape states
      check_command           check_NRPE!check_ee_tapes
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE pool state
      check_command           check_NRPE!check_ee_pools
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE running tasks
      check_command           check_NRPE!check_ee_rtasks
      }

define service {
      use                     generic-service
      hostgroup_name          eenodes
      service_description     Check EE completed tasks
      check_command           check_NRPE!check_ee_ctasks
      }

 

Find an example of the object definition in the github repo.

Once the object definition has been done and store in default object location /etc/Icinga/objects restart the Icinga process using systemctl or init.d. Afterwards logon to the Icinga GUI and enjoy the Spectrum Archive nodes being monitored. Here is an screenshot example:

Example for Icinga monitoring IBM Spectrum Archive EE

 

Of course, you can and should add more checks to the Spectrum Archive nodes. How about performing some checks for the underlying IBM Spectrum Scale software? Alex Saupp and Achim Christ have created a repo to perform checks on Spectrum Scale components (https://gitlab.com/itsmee/icinga/tree/master). This can be very well integrated with the Spectrum Archive checks.

Have fun trying that code. Please forgive me using the outdated Icinga 1 object definitions, this was all I had available for testing. Perhaps you can migrate and test the Icinga 2 definitions and contribute to the github repo.

 

0 comments
8 views

Permalink