AIOps

 View Only

Watson for AIOps Event Manager - Using semi automated runbooks with ssh automation (1/3)

By Gurpreet Kaur posted Tue July 12, 2022 06:51 PM

  

The Watson for AIOps Event Manager - Runbook Automation blog mini-series will cover the procedure to set up Semi-automated & fully automated runbooks.

IBM Runbook Automation can automate procedures that can help IT staff solve everyday operational problems and do not require human interaction, thereby increasing the efficiency of IT operational processes.


Runbooks start as documented procedures on a piece of paper that can become fully automated procedures. 

This first module focuses on the creation of a simple Semi-automated runbook. 





Scenario
: There is a httpd server running on <HOST> . The httpd process is monitored and configured to generate an alert when the Httpd API service goes UP/DOWN.

Objective: Create a runbook to restart httpd API service and associate it with “Httpd API Service Down” alert.

  

By the end of this module, you will have a fully functional semi-automated runbook ready in the Watson AIOPs Event Manager environment.

This module should take you about 20 minutes to complete and includes the following steps:

 

Step 1: Configure Integration with other systems (2 minutes)

Step 2: Create Automation (5 minutes)

Step 3: Create Runbook (4 minutes)

Step 4: Create Trigger (4 minutes)

Step 5: Test runbook (5 minutes)


Step 1: Configure Integration with other systems

For executing runbooks, users need to choose the type of automation provider from  Ansible Tower, SCRIPT or  BigFix.  A connection must be set up to connect to your target endpoints.

The SCRIPT automation provider allows to execute scripts (bash, ksh, perl ) on a target system. In this example, we will use the SCRIPT automation provider to establish an ssh session to target the endpoint. To allow RBA to log in without needing to specify a password, we need to copy RBA's public key into the user’s SSH authorized_keys of the target host. We can get the RBA public key by accessing the Watson AIOps — Event Manager User Interface as follows:

Navigate to:
Administration → Integration with other systems > automation type > script > configure



Copy the public key to ~/.ssh/authorized_key  on the remote host where we want to execute the command.


Step 2: Create Automation

Automation is a unit of programmatic instructions in RBA. Automation can be a script that is run through an SSH session on a remote system, an HTTP(s) API call, a BigFix call, or an Ansible tower Job or Job Workflow call.  A Runbook is formed by combining one or more automation. You can create new automation using Watson AIOps — Event Manager UI as follows:

Navigate to:
Automations → Runbooks → Automations → Create Automation

Script :

echo "Restarting the HTTPD service"
sudo systemctl restart httpd
sudo systemctl status httpd | grep Active


Automation Parameters:

target  and user are default parameters are required to execute automation on the remote system. Add another parameter Identifier of String type that we can use later for updating alarm based on runbook results.



Save the automation. 


Step 3: Create Runbook
Navigate to:
Automations → Runbooks → Library → Create Runbook


Add automated Step > Select "Restart API Service" Automation


Map Automation parameters as follows:


Publish runbook by clicking 
Actions > Publish 

Step 4: Create Trigger

Triggers are used to associate runbook to the alerts.  It looks for events in which their Summary fields match the string pattern.
For more information on triggers : https://www.ibm.com/docs/en/noi/1.6.5?topic=triggers-create-trigger



Step 5:  Test runbook

Connect to OCP Infra node.
Install & start httpd service on OCP Infra node.
yum install httpd
systemctl start httpd
systemctl status httpd

Login to OCP cluster

Now stop httpd Service and insert some test alert for httpd Service. (In live monitoring this alert is automatically generated when Service goes down)

systemctl stop httpd
systemctl status httpd

[root@api.aiops164.cp.fyre.ibm.com ~]# oc exec -it noi-ncoprimary-0 -- /bin/bash -c '/opt/IBM/tivoli/netcool/omnibus/bin/nco_sql -server AGG_P -user root -pass $OMNIBUS_ROOT_PWD'

insert into alerts.status (Identifier,Severity,Type,AlertGroup,Node,FirstOccurrence,LastOccurrence,Manager,Class,Summary) values('Demo API Server <HOSTNAME> 1',5,1,'Demo','<HOSTNAME>',getdate,getdate,'Tivoli EIF Probe',6601,'API Server on 9.30.91.78:8083 is DOWN');
go

-----Sample Output
Warning: Failed to find tar in the following directories : /bin /usr/bin
1> insert into alerts.status (Identifier,Severity,Type,AlertGroup,Node,FirstOccurrence,LastOccurrence,Manager,Class,Summary) values('Demo API Server 9.30.91.78 1',5,1,'Demo','9.30.91.78',getdate,getdate,'Tivoli EIF Probe',6601,'API Server on 9.30.91.78:8083 is DOWN');
2> go
(1 row affected)

Launch Alert Viewer and check alert with Summary = " API Server on <HOST> is DOWN". There should be a runbook associated with this alert automatically.



Execute runbook > Start Runbook



Connect to OCP Infra node.

Check Status of httpd  service on OCP Infra node.

systemctl status httpd

It should return running status , as runbook automatically started this service 

● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-07-12 15:30:36 PDT; 5s ago
Docs: man:httpd.service(8)
Process: 393254 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 408607 (httpd)
Status: "Started, listening on: port 8085"
Tasks: 213 (limit: 49514)
Memory: 42.5M
CGroup: /system.slice/httpd.service
├─408607 /usr/sbin/httpd -DFOREGROUND
├─408608 /usr/sbin/httpd -DFOREGROUND
├─408609 /usr/sbin/httpd -DFOREGROUND
├─408610 /usr/sbin/httpd -DFOREGROUND
└─408611 /usr/sbin/httpd -DFOREGROUND

For more details on Runbooks, see the documentation link: https://www.ibm.com/docs/en/noi/1.6.5?topic=systems-managing-runbooks-automations

You have now completed this module and are ready for module 2: Using a fully automated runbook with event journal update (2/3)

0 comments
85 views

Permalink