Step 1: Configure Integration with other systems (2 minutes)
Step 2: Create Automation (5 minutes)
Step 3: Create Runbook (4 minutes)
Step 4: Create Trigger (4 minutes)
Step 5: Test runbook (5 minutes)
Step 1: Configure Integration with other systems
For executing runbooks, users need to choose the type of automation provider from Ansible Tower, SCRIPT or BigFix. A connection must be set up to connect to your target endpoints.
The SCRIPT automation provider allows to execute scripts (bash, ksh, perl ) on a target system. In this example, we will use the SCRIPT automation provider to establish an ssh session to target the endpoint. To allow RBA to log in without needing to specify a password, we need to copy RBA's public key into the user’s SSH authorized_keys of the target host. We can get the RBA public key by accessing the Watson AIOps — Event Manager User Interface as follows:
Navigate to:
Administration → Integration with other systems > automation type > script > configure
Copy the public key to ~/.ssh/authorized_key on the remote host where we want to execute the command.
Step 2: Create Automation
Automation is a unit of programmatic instructions in RBA. Automation can be a script that is run through an SSH session on a remote system, an HTTP(s) API call, a BigFix call, or an Ansible tower Job or Job Workflow call. A Runbook is formed by combining one or more automation. You can create new automation using Watson AIOps — Event Manager UI as follows:
Navigate to:
Automations → Runbooks → Automations → Create Automation
Script :
echo "Restarting the HTTPD service"
sudo systemctl restart httpd
sudo systemctl status httpd | grep Active
Automation Parameters:
target and user are default parameters are required to execute automation on the remote system. Add another parameter Identifier of String type that we can use later for updating alarm based on runbook results.
Save the automation.
Step 3: Create Runbook
Navigate to:
Automations → Runbooks → Library → Create Runbook
Add automated Step > Select "Restart API Service" Automation
Map Automation parameters as follows:
Publish runbook by clicking
Actions > Publish
Step 4: Create Trigger
Triggers are used to associate runbook to the alerts. It looks for events in which their Summary fields match the string pattern.
For more information on triggers : https://www.ibm.com/docs/en/noi/1.6.5?topic=triggers-create-trigger
Step 5: Test runbook
Connect to OCP Infra node.
Install & start httpd service on OCP Infra node.
yum install httpd
systemctl start httpd
systemctl status httpd
Login to OCP cluster
Now stop httpd Service and insert some test alert for httpd Service. (In live monitoring this alert is automatically generated when Service goes down)
systemctl stop httpd
systemctl status httpd
[root@api.aiops164.cp.fyre.ibm.com ~]# oc exec -it noi-ncoprimary-0 -- /bin/bash -c '/opt/IBM/tivoli/netcool/omnibus/bin/nco_sql -server AGG_P -user root -pass $OMNIBUS_ROOT_PWD'
insert into alerts.status (Identifier,Severity,Type,AlertGroup,Node,FirstOccurrence,LastOccurrence,Manager,Class,Summary) values('Demo API Server <HOSTNAME> 1',5,1,'Demo','<HOSTNAME>',getdate,getdate,'Tivoli EIF Probe',6601,'API Server on 9.30.91.78:8083 is DOWN');
go
-----Sample Output
Warning: Failed to find tar in the following directories : /bin /usr/bin
1> insert into alerts.status (Identifier,Severity,Type,AlertGroup,Node,FirstOccurrence,LastOccurrence,Manager,Class,Summary) values('Demo API Server 9.30.91.78 1',5,1,'Demo','9.30.91.78',getdate,getdate,'Tivoli EIF Probe',6601,'API Server on 9.30.91.78:8083 is DOWN');
2> go
(1 row affected)
Launch Alert Viewer and check alert with Summary = " API Server on <HOST> is DOWN". There should be a runbook associated with this alert automatically.
Execute runbook > Start Runbook
Connect to OCP Infra node.
Check Status of httpd service on OCP Infra node.
systemctl status httpd
It should return running status , as runbook automatically started this service
● httpd.service - The Apache HTTP Server
Loaded: loaded (/usr/lib/systemd/system/httpd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2022-07-12 15:30:36 PDT; 5s ago
Docs: man:httpd.service(8)
Process: 393254 ExecReload=/usr/sbin/httpd $OPTIONS -k graceful (code=exited, status=0/SUCCESS)
Main PID: 408607 (httpd)
Status: "Started, listening on: port 8085"
Tasks: 213 (limit: 49514)
Memory: 42.5M
CGroup: /system.slice/httpd.service
├─408607 /usr/sbin/httpd -DFOREGROUND
├─408608 /usr/sbin/httpd -DFOREGROUND
├─408609 /usr/sbin/httpd -DFOREGROUND
├─408610 /usr/sbin/httpd -DFOREGROUND
└─408611 /usr/sbin/httpd -DFOREGROUND