View Only

Managing Change Risk with Infrastructure through Watson AIOps

By Khalid Ahmed posted Mon December 13, 2021 12:05 PM


Co-Authors: Chad Holliday ( -  Senior Development Manager - IBM Cloud Pak for Watson AIOps Infrastructure Automation,
Amitabh  Prasad  ( - Software Architect - IBM Cloud Pak for Watson AIOps Infrastructure Automation

When infrastructure is modified, there’s a chance something will break or violate an essential compliance policy.   Misconfiguration of servers, devices, or applications lead to outages, security holes and many un-answered questions, such as what happened when similar changes were applied against this type of infrastructure in the past?  How can we better manage compliance of configuration at run-time instead of correct configurations after the deployment? 


Tools like ServiceNow help IT teams manage the process of making infrastructure changes. But here we have an opportunity to grow existing capabilities by pairing AI with Infrastructure Automation to help reduce any Change Risk leading to outages or security issues. 


With IBM CloudPak for Watson AIOps, IT operators can assess risk of a code deployment based on prior incidents; estimate chances of a successful update and identify the risks; build logs, apply security policies and more!


Here’s a good example of how:


  • The Change Risk module ingests un-structured data from ServiceNow, analyzes the data - looking for similar incidents and understanding how they may have failed in the past, then provides an assessment of the overall risk of the suggested change. 
  • The Infrastructure Automation (IA) module allows for the automation of the delivery of infrastructure services which integrates with ServiceNow. It can update the ServiceNow CMDB or create change requests.
  • Since IA is an end-to-end infrastructure provisioning and management capability - built on open-source technologies such as Terraform and Ansible, after infrastructure is provisioned, ongoing checks can be made against the configuration to ensure systems are running as prescribed and adhere to system-wide compliance policies.

Now let's see this in action.  Here’s a demonstration of a possible flow of interactions between operators and Infrastructure Automation (IA) and Change Risk components of IBM CloudPak for Watson AIOps and ServiceNow. 


Let's say it is determined that an additional disk should be added to a virtual machine.


1) Operator invokes a day-2 operation in IA to change the server configuration to create and attach the new disk.




2) The day-2 operation will create a Change Request ticket in Service Now, populating it with details on the requested change (virtual machine ID, new disk size, etc.)


3) AIOps Change Risk will monitor ServiceNow tickets. Using its pre-trained model by analyzing similar tickets, it is able to classify it as high or low risk change and add a confidence level of the prediction onto the ServiceNow ticket.


4) The ServiceNow operator will approve the change request based on the confidence level, and mark it ready for implementation.


5) The day-2 operation in IA will wait for the ServiceNow Change Request to be approved before proceeding.


6) After observing the change in state of the ServiceNow Change Request, IA will carry out the provisioning change to add the disk to the virtual machine.

7) After completing the provisioning action, IA will update the ServiceNow Change Request to mark it as complete, where it can be reviewed and closed by a ServiceNow operator.


8) IA policies will be applied against the configuration and alerts are raised if server configuration isn't in compliance.   Both server VM parameters as well as application middleware packages can be checked for compliance against defined policies.



By combining the use of AI with Infrastructure Automation we help to reduce the risk of infrastructure changes leading to outages or security issues.  IBM CloudPak for Watson AIOps integrated with ServiceNow can give IT departments holistic control over how changes to their infrastructure are managed. The net result is a more resilient infrastructure and assurance that risks are mitigated.