AIOps

 View Only

Using Ansible Tower Connector for Incident Management in IBM Cloud Pak for Watson AIOps

By Guangya Liu posted Mon May 16, 2022 12:01 PM

  

With the Ansible Tower Automation Provider, IBM Cloud Pak for Watson AIOps is able to connect to the remote Ansible Tower, retrieve all its playbooks and take those playbooks as automation actions for IBM Cloud Pak for Watson AIOps RunBook Automation.

As Ansible Tower can also use Ansible Collections from Ansible Galaxy (the upstream community for sharing Ansible Collections) so IBM Cloud Pak for Watson AIOps can also leverage Ansible Collections.

Use Case

In this demonstration, I will share a use case of using an IBM Cloud Pak for Watson AIOps policy to monitor a liberty app. If there is some memory leak of the liberty app, the policy will trigger a github issue to report the memory leak via the Ansible Tower integration.

There will be two key roles in this demonstration: a Site Reliability Engineer (SRE) and a Developer.

Key concepts for the demonstration

Before I begin, I’d like to share some important concepts that will be used in the following demonstration:

  • Policies: Policies are rules that contain multiple condition and action sets. They can be triggered to automatically promote events to alerts, reduce noise by grouping alerts into a story, and assign runbooks to remediate alerts.
  • Runbooks: Use Runbook Automation to build and execute runbooks that can help IT staff to solve common operational problems. Runbook Automation can automate procedures that do not require human interaction, thereby increasing the efficiency of IT operations processes. Operators can spend more time innovating and are freed from performing time-consuming manual tasks.
  • Actions: In runbooks, actions are the collection of several manual steps into a single automated entity. An action improves runbook efficiency by automatically performing procedures and operations.

SRE requirements

As an SRE, I’d like to start an automated memory leak investigation for liberty server, in case of any memory leak issue for someone’s liberty.

Connect To Ansible Tower

  • First, create an Ansible connector to connect your AIOps with the Ansible tower. Goto IBM Cloud Pak for Watson AIOps UI Define - Data and tool connections -> add connection
  • Select Ansible Tower and create Ansible Tower Connection as follows.
  • Configure Ansible Tower Connection (you need to configure the URL, User Name and Password to connect to Ansible Tower).
  • After this is done, you will have IBM Cloud Pak for Watson AIOps connected to your Ansible Tower.

Create An Automation Runbook to Perform a Memory Leak Analysis

  • Navigate to Runbooks via Operate -> Automations -> Runbooks Tab -> Create Runbook.
  • Create a runbook for Memory Leak Analysis for Liberty. Please note that we are using playbooks from Ansbile Tower, and, as we have already connected IBM Cloud Pak for Watson AIOps to Ansible Tower, when creating runbooks, theIBM Cloud Pak for Watson AIOps can retrieve all of the playbooks from Ansible Tower and the SRE can select the playbook for Memory Leak Analysis and use this playbook to create a runbook.
  • OK, now, we have the runbook ready to use.

Create Automation Policy to Execute the Runbook

  • Now we need to create an automation policy to associate the runbook with the incoming alert. Click Policies -> Create Policy
  • When using Create policy, select Assign a runbook to alerts, this will enable that we can trigger the policy via some alerts.
  • A new window will pop up and ask you to assign a runbook to Alerts, this is actually creating a IBM Cloud Pak for Watson AIOps Policy.
  • When creating the policy, you are also requested to input some conditions, those conditions decide when the runbook will be triggered. For the following case, we are using three conditions:
  • summary, contains, all of, Log Anomaly found and demo-liberty-server1, this is our server name.
  • state, equals to, only, open
  • details, contains, any of, OutOfMemoryError.
  • Now we need to select which runbook to use. Select the runbook we just created, and thenAutomatically run the runbook.
  • Now we have created both runbook and policy. Once there’s OutOfMemory, an alert will trigger the automated memory leak investigation to that server.
  • You will haver registered a liberty app to your WebSphere Automation. (I will share more detail for how to register apps to WebSphere Automation in my next blog). After the liberty app is running for a while, you realize something is wrong with the Liberty server: the Liberty server is really slow in response. Check the Humio Liberty log. This indicates that there was an OutOfMemoryError.
  • You discover what is wrong with your liberty app, go to IBM Cloud Pak for Watson AIOps to get the answer. Then I switch to Cloud Pak for Watson AIOps UI -> Operate -> Stories and alerts
  • There was an. alert sent out, for the server demo-liberty-server1. Check the runbook activities via CP4WAIOPS UI -> Operate -> Automations -> Runbooks -> Activities
  • Till now, the overall callstack in IBM Cloud Pak for Watson AIOps was as follows, but the alerts triggered the Ansible playbook from Ansible Tower to take actions for the Liberty Memory Leak Issue.
  • We can see the memory leak investigation runbook has been triggered by the alert and it has finished . Click Details to get some details on the issue.

As a Developer

As a Liberty Developer, I was assigned by a Github issue, which was opened by IBM Cloud Pak for Watson AIOps Runbook Automation, The Github issue included all the info for the Memory Leak: Java Class, Heap Size and also the full analysis report file. I can download and check the detail of the log to see what is wrong with Liberty, and try to fix the issue based on this information.

Summary

The above scenario is a very typical use case of using IBM Cloud Pak for Watson AIOps and Ansible Tower to help detect and fix some incidents for the applications running in your environments. The demo scenario can be extended to manage many other applications as you want, as long as you already have some Ansible playbooks. Please refer to our official knowledge center to get more detail for how to leverage Ansible Tower Automation Provider for more scenarios.

Credit to Arthur De Magalhaes, Neil Boyette and Chuan Huang, thank you so much for your great support and help for this Ansible Tower integration demo case.


#IncidentManagement
#CloudPakforWatsonAIOps
#ansible
#AnsibleTower
#RBA
#RunbookAutomation
#HowTo
#ChangeManagement
0 comments
17 views

Permalink