With the Ansible Tower Automation Provider, IBM Cloud Pak for Watson AIOps is able to connect to the remote Ansible Tower, retrieve all its playbooks and take those playbooks as automation actions for IBM Cloud Pak for Watson AIOps RunBook Automation.
As Ansible Tower can also use Ansible Collections from Ansible Galaxy (the upstream community for sharing Ansible Collections) so IBM Cloud Pak for Watson AIOps can also leverage Ansible Collections.
Use Case
In this demonstration, I will share a use case of using an IBM Cloud Pak for Watson AIOps policy to monitor a liberty app. If there is some memory leak of the liberty app, the policy will trigger a github issue to report the memory leak via the Ansible Tower integration.
There will be two key roles in this demonstration: a Site Reliability Engineer (SRE) and a Developer.
Key concepts for the demonstration
Before I begin, I’d like to share some important concepts that will be used in the following demonstration:
- Policies: Policies are rules that contain multiple condition and action sets. They can be triggered to automatically promote events to alerts, reduce noise by grouping alerts into a story, and assign runbooks to remediate alerts.
- Runbooks: Use Runbook Automation to build and execute runbooks that can help IT staff to solve common operational problems. Runbook Automation can automate procedures that do not require human interaction, thereby increasing the efficiency of IT operations processes. Operators can spend more time innovating and are freed from performing time-consuming manual tasks.
- Actions: In runbooks, actions are the collection of several manual steps into a single automated entity. An action improves runbook efficiency by automatically performing procedures and operations.
SRE requirements
As an SRE, I’d like to start an automated memory leak investigation for liberty server, in case of any memory leak issue for someone’s liberty.
Connect To Ansible Tower
- First, create an Ansible connector to connect your AIOps with the Ansible tower. Goto IBM Cloud Pak for Watson AIOps UI
Define - Data and tool connections
-> add connection
- Select
Ansible Tower
and create Ansible Tower Connection as follows.
- Configure Ansible Tower Connection (you need to configure the URL, User Name and Password to connect to Ansible Tower).
- After this is done, you will have IBM Cloud Pak for Watson AIOps connected to your Ansible Tower.
Create An Automation Runbook to Perform a Memory Leak Analysis
- Navigate to
Runbooks
via Operate -> Automations -> Runbooks Tab -> Create Runbook
.
- Create a runbook for Memory Leak Analysis for Liberty. Please note that we are using playbooks from Ansbile Tower, and, as we have already connected IBM Cloud Pak for Watson AIOps to Ansible Tower, when creating runbooks, theIBM Cloud Pak for Watson AIOps can retrieve all of the playbooks from Ansible Tower and the SRE can select the playbook for Memory Leak Analysis and use this playbook to create a runbook.
- OK, now, we have the runbook ready to use.
Create Automation Policy to Execute the Runbook
- Now we need to create an automation policy to associate the runbook with the incoming alert. Click
Policies -> Create Policy
- When using
Create policy
, select Assign a runbook to alerts
, this will enable that we can trigger the policy via some alerts.
- A new window will pop up and ask you to assign a runbook to Alerts, this is actually creating a IBM Cloud Pak for Watson AIOps Policy.
- When creating the policy, you are also requested to input some conditions, those conditions decide when the runbook will be triggered. For the following case, we are using three conditions:
summary
, contains
, all of
, Log Anomaly found
and demo-liberty-server1
, this is our server name.
state
, equals to
, only
, open
details
, contains
, any of
, OutOfMemoryError
.
- Now we need to select which runbook to use. Select the runbook we just created, and then
Automatically run the runbook
.
- Now we have created both runbook and policy. Once there’s
OutOfMemory,
an alert will trigger the automated memory leak investigation to that server.
- You will haver registered a liberty app to your WebSphere Automation. (I will share more detail for how to register apps to WebSphere Automation in my next blog). After the liberty app is running for a while, you realize something is wrong with the Liberty server: the Liberty server is really slow in response. Check the Humio Liberty log. This indicates that there was an
OutOfMemoryError
.
- You discover what is wrong with your liberty app, go to IBM Cloud Pak for Watson AIOps to get the answer. Then I switch to
Cloud Pak for Watson AIOps UI
-> Operate
-> Stories and alerts
- There was an. alert sent out, for the server
demo-liberty-server1
. Check the runbook activities via CP4WAIOPS UI
-> Operate
-> Automations
-> Runbooks
-> Activities
- Till now, the overall callstack in IBM Cloud Pak for Watson AIOps was as follows, but the alerts triggered the Ansible playbook from Ansible Tower to take actions for the Liberty Memory Leak Issue.
- We can see the memory leak investigation runbook has been triggered by the alert and it has finished . Click
Details
to get some details on the issue.
As a Developer
As a Liberty Developer, I was assigned by a Github issue, which was opened by IBM Cloud Pak for Watson AIOps Runbook Automation, The Github issue included all the info for the Memory Leak: Java Class, Heap Size and also the full analysis report file. I can download and check the detail of the log to see what is wrong with Liberty, and try to fix the issue based on this information.
Summary
The above scenario is a very typical use case of using IBM Cloud Pak for Watson AIOps and Ansible Tower to help detect and fix some incidents for the applications running in your environments. The demo scenario can be extended to manage many other applications as you want, as long as you already have some Ansible playbooks. Please refer to our official knowledge center to get more detail for how to leverage Ansible Tower Automation Provider for more scenarios.
Credit to Arthur De Magalhaes, Neil Boyette and Chuan Huang, thank you so much for your great support and help for this Ansible Tower integration demo case.
#IncidentManagement#CloudPakforWatsonAIOps#ansible#AnsibleTower#RBA#RunbookAutomation#HowTo#ChangeManagement