I have created a program that checks various IBM Spectrum Archive EE components, analyses the state and prints the result. This program can be integrated as remote plugin with an Icinga monitoring server. In this blog article I explain the program and show the integration with Icinga version 1 and Nagios Remote Plugin Executor (NRPE).
The IBM Spectrum Archive EE components being checked and analyzed are:
- status of software
- nodes
- tape drives
- tapes
- pools
- task (running and completed)
The program is published on github.com at: https://github.com/nhaustein/check_spectrumarchive
The program is based on IBM Spectrum Archive EE version 1.3 and uses on the eeadm command to obtain the status of the components in JSON format. To analyse the JSON output I use the tool jq (https://stedolan.github.io/jq/). This tool must be installed on all Spectrum Archive nodes that run the program. Using the JSON output as source for the analysis is much more efficient because of the key – value pairs. It prevents to search the output for tokens that may change over time.
Syntax
This program can be invoked with one parameter at a time and performs the appropriate checks:
Syntax:
./check_spectrumarchive.sh [ -s | -n | -t | -d | -p<util> | -a<r|c> -e -h]
Options:
-s --> Verify IBM Spectrum Archive status
-n --> Verify node status
-t --> Verify tape states
-d --> Verify drive states
-p <util> --> Check tape pool utilization is above %util
-a<r|c> --> Check for running or completed tasks that failed
-e --> Check all components above
-h --> Print This Help Screen
The program returns OK
, WARNING
or ERROR
along with the appropriate return 0, 1 or 2 respectively. Upon WARNING and ERROR messages the program prints the components and status.
Only one option can be specified at a time. The combination of multiple options in one call of the program does not work.
The program can be used standalone or it can be integrated with an external Icinga or Nagios monitoring server.
Installation and prerequisites
Clone the github repo and transfer the check_spectrumarchive.sh program to all Spectrum Archive EE nodes that need to be monitored. Make the program executable and place it in a directory referenced by the PATH environment variable (e.g. /usr/local/bin).
Download the jq tool and install it on the Spectrum Archive nodes as well. Enter the installation path of the jq tool into the check_spectrumarchive.sh program (parameter: $JQ_TOOL). The default location is /usr/local/bin.
Once this has been done execute the check_spectrumarchive.sh program on the Spectrum Archive EE nodes and make sure it works.
Integration with Icinga
Icinga allows to monitor infrastructure and services. The Icinga architecture is client and server based.
The server is the Icinga server providing the graphical user interface and the option to configure monitored objects such as host groups, hosts and services. The hosts to be monitored are the Spectrum Archive nodes. The services are checked with the check_spectrumarchive.sh program. The Icinga server essentially calls the program on the remote Spectrum Archive nodes using NRPE. More information about Icinga: https://Icinga.com/products/
The client is the IBM Spectrum Archive nodes being monitored. The communication between the server and the client can be based on Nagios Remote Plugin Executor (NRPE). This requires to install and configure NRPE on the Spectrum Archive nodes. More information about NRPE: https://exchange.nagios.org/directory/Addons/Monitoring-Agents/NRPE--2D-Nagios-Remote-Plugin-Executor/details
Prepare the client
In order to monitor the Spectrum Archive nodes using NRPE the NRPE packages and optionally the Nagios-plugins have to be installed and configured. These packages need to be installed on all Spectrum Archive to be monitored. There are different ways to install NRPE and Nagios plugins. Red Hat does not include these packages in the standard installation repository, but they can be downloaded from other sources (e.g. rpmfind). The following packages should be installed:
nrpe, nagios-common, nagios-plugin
An alternative way for installing NRPE and nagios-plugins can be found here: https://support.nagios.com/kb/article.php?id=8
After the installation of NRPE has finished, notice some important path and configuration files:
- NRPE configuration file (NRPE.cfg), default location is /etc/nagios/NRPE.cfg
- nagios plugins (check_*), default location is /usr/lib64/nagios/plugins
Edit the NRPE configuration file (e.g. /etc/nagios/NRPE.cfg) and set the include directory:
include_dir=/etc/NRPE.d/
The check_spectrumarchive.sh program must be run with root privileges. NRPE however does not run as root but as a user that is defined in the NRPE.cfg file (NRPE_user, NRPE_group). The default user and group name is NRPE. Consequently sudo must be configured on the server to allow the NRPE-user to run the check_spectrumarchive.sh tool. To configure sudo, perform these steps:
- In the NRPE-configuration file (/etc/nagios/NRPE.cfg) set command prefix to sudo:
command_prefix=/usr/bin/sudo
- Add the NRPE-user to the sudoer configuration:
%NRPE ALL=(ALL) NOPASSWD: /usr/local/bin/check_spectrumarchive.sh*, /usr/lib64/nagios/plugins/*
Now copy the executable program check_spectrumarchive.sh to /usr/local/bin
Switch to the NRPE-user and test if the program works under the sudo context:
# /usr/bin/sudo check_spectrumarchive.sh -s
Note, if you are not able to switch to the NRPE-user you may have to specify a login shell for the user (temporarily).
Create the NRPE-configuration for the Spectrum Archive specific checks using this program as a remote plugin. Note, the allowed_hosts must include the IP address of your Icinga server. Each check has a name given in [] which executes a particular command, such as /usr/local/bin/check_spectrumarchive.sh -s. Find an example below:
allowed_hosts=127.0.0.1,9.155.114.101
command[check_users]=/usr/local/nagios/libexec/check_users -w 2 -c 5
command[check_ee_state]=/usr/local/bin/check_spectrumarchive.sh -s
command[check_ee_nodes]=/usr/local/bin/check_spectrumarchive.sh -n
command[check_ee_tapes]=/usr/local/bin/check_spectrumarchive.sh -t
command[check_ee_drives]=/usr/local/bin/check_spectrumarchive.sh -d
command[check_ee_pools]=/usr/local/bin/check_spectrumarchive.sh -p 80
command[check_ee_rtasks]=/usr/local/bin/check_spectrumarchive.sh -a r
command[check_ee_ctasks]=/usr/local/bin/check_spectrumarchive.sh -a c
Find an example of the NRPE configuration in the github repo
Now start and enable the NRPE service and check the status:
Continue with the configuration of the monitored objects on the Icinga server.
Configure Icinga server
I used a docker container running Icinga. The default configuration of the Icinga server is located in /etc/Icinga. The default location for the object definition is in /etc/Icinga/objects.
First check that the Icinga server can communicate with the Spectrum Archive nodes using NRPE. For this purpose the check_NRPE plugin of the server can be used. The default location is: /usr/lib/nagios/plugins/check_NRPE. Find an example below:
# /usr/lib/nagios/plugins/check_NRPE -H <IP of Spectrum Archive node>
This command should return the NRPE version. If this is not the case investigate the problem. Likewise you can execute a remote check:
# /usr/lib/nagios/plugins/check_NRPE -H <IP of Spectrum Archive node>
-c check_ee_state
This command should also return an appropriate response
If the NRPE communication and remote commands work then allow external commands by opening the Icinga configuration file (/etc/icinga/icinga.cfg) and adjust this setting:
check_external_commands=1
Now configure the objects for the Spectrum Archive nodes. It is recommended to create a new file in directory /etc/Icinga/objects. In the example below two Spectrum Archive host (eenode1 and eenode2) are assigned to a host group (eenodes). For this host group a number of services are defined that within the define service stanza. Each service has a name, a host group where it is executed and a check command. The check command specifies a NRPE check and the name of the check that was configured in the NRPE-configuration of the client. For example the check_command check_NRPE!check_ee_state will execute the command /usr/local/bin/check_spectrumarchive.sh -s on the hosts.
define hostgroup {
hostgroup_name eenodes
alias EE Nodes
members eenode1,eenode2
}
define host {
use generic-host
host_name eenode1
alias EE Node 1
address <ip of ee node 1>
}
define host {
use generic-host
host_name eenode2
alias EE Node 2
address <ip of ee node 2>
}
define service {
use generic-service
hostgroup_name eenodes
service_description Users logged on to the system
check_command check_NRPE!check_users
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE software state
check_command check_NRPE!check_ee_state
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE node state
check_command check_NRPE!check_ee_nodes
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE drive states
check_command check_NRPE!check_ee_drives
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE tape states
check_command check_NRPE!check_ee_tapes
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE pool state
check_command check_NRPE!check_ee_pools
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE running tasks
check_command check_NRPE!check_ee_rtasks
}
define service {
use generic-service
hostgroup_name eenodes
service_description Check EE completed tasks
check_command check_NRPE!check_ee_ctasks
}
Find an example of the object definition in the github repo.
Once the object definition has been done and store in default object location /etc/Icinga/objects restart the Icinga process using systemctl or init.d. Afterwards logon to the Icinga GUI and enjoy the Spectrum Archive nodes being monitored. Here is an screenshot example:
Of course, you can and should add more checks to the Spectrum Archive nodes. How about performing some checks for the underlying IBM Spectrum Scale software? Alex Saupp and Achim Christ have created a repo to perform checks on Spectrum Scale components (https://gitlab.com/itsmee/icinga/tree/master). This can be very well integrated with the Spectrum Archive checks.
Have fun trying that code. Please forgive me using the outdated Icinga 1 object definitions, this was all I had available for testing. Perhaps you can migrate and test the Icinga 2 definitions and contribute to the github repo.