Rapid Network Automation

 View Only

Delivering true value through the integration of automation into everyday tasks

By Akshaya Gurlhosur posted Mon April 15, 2024 02:05 PM

  

The design and complexity of our networks is making it harder to ensure that we have the latest versions of firmware on our devices to fend off security attacks. The volume and complexity of faults impacting our services necessitate the need to pull information from an ever-increasing number of databases and sources to enable our engineers to locate the actual problem causing the impact on service.   

Now wouldn’t it be great if I could provide a solution that combines our team's existing skills and knowledge and wraps automation around the activity in such a way that the solution manages itself? It can provide all the information required to ensure that our engineers only get involved in the last 1% of failures during a network-wide upgrade. Operations engineers are presented with all the information required to diagnose the fault, and its impact, and send out automated alerts to both the customers of the service and to our management teams internally.  Ultimately reducing the time taken to resolve the fault. 

In order to minimize the security risks associated with large networks, we have to ensure that each device has the latest supported version of software ensuring that known vulnerabilities have been removed.  Updating access control lists automatically, especially in large Wifi networks can help to reduce the risk of security breaches.  

An example problem 

Our network has over 10,000 devices from a range of vendors and a selection of models to deliver our many services to over 2 million customers. 

Each year all 10,000 devices are required to be updated with the latest firmware which helps to ensure that we minimize security breaches and maintain the minimum supported level of firmware to satisfy the vendor's criteria for support. 

 We already have in place some basic automation scripts to load the new firmware on the devices, but inevitably we see a fairly large number of failures that then require a qualified engineer to manually investigate each failed instance which not only takes time but could also result in service failures.  

Ideally we would have a solution that could deliver the complete end-to-end upgrade process with live updates on progress. 

The Solution 

IBM Rapid Network Automation delivers the ability to leverage end-to-end automation using its low code workflow editor. It can access physical devices via CLI and SSH. It has access to the Open API from all the leading equipment vendors as well as the most common service platforms such as ServiceNow, Slack, and Email. Most importantly it has the ability to take actions based on information gathered throughout the progression of the activities as well as provide real-time updates. 

In our solution to upgrade all 10,000 devices, we would perform several steps as outlined below. 

Phase 1:  

  • Automatically retrieve device details such as make, version, IP address, login details, and firmware. 

  • In parallel, automatically access each device securely and perform a pre-check to ensure that it has the required previous version of the firmware. 

  • If the required firmware is not found, automatically create a ticket in ServiceNow informing the upgrade team that a device has failed a firmware check. 

  • Based on the firmware level discovered, several additional upgrades can be automated to arrive at the desired level of firmware to enable the upgrade to continue. 

Phase 2 

  • Now that the required firmware check is correct, several automatic pre-checks can be made on the device to ensure that the upgrade stands a good chance of completing such as ree memory, processor load 

  • If any of the checks fail, then an update is automatically sent to ServiceNow informing the upgrade team that device pre-checks have failed. Update will include all the details of the actual check results and a copy of the log file will be attached to the ticket. 

  • The workflow will wait until the ticket is resolved by the engineer. 

  • Once the ticket is closed the firmware upload will take place. 

  • Periodic checks will be made automatically to see if the upload of the file has been completed and has been successful. 

  • If the upload fails, then a new ServiceNow update will inform the engineer of the failed upload and supply all the available information on the failure captured by this part of the workflow and also send the latest log file. 

  • Once the upload ticket is closed the new firmware version will be made active during the agreed outage time 

  • Periodic checks will be made to see if the firmware update was successful. 

  • If an update fails, then the reason codes and the log files will be sent via       ServiceNow to the engineers. 

  • A certain number of detected failures will have known actions that can be performed by the workflow to affect a successful update.  

  • If all known options results in a failed update, then the engineers are updated with the complete list of activities performed, copy of the log files and request that a manual intervention is required. 

  • For all successful updates a log file and any other device specific information will be added to the ServiceNow ticket. All relevant updates to inventory databases and systems will be automatically performed including confirmation information which can be also added to ServiceNow ticket. 

A separate set of reporting would be made available to show progress against the upgrading of the 10,000 devices. It can breakdown the failures at each step of the workflow, it can show the time taken per device, vendor, and model. The actual level of detail in the reports can be customized for every task/solution. 

Updating of external and or internal systems can be added at any stage of the process. Tasks can be scheduled into batches or limit the number running at any given time depending on the solution.  

Summary 

Our customers are looking for solutions that empower them to move beyond simple automation and look towards delivering complete solutions that can run independently. A solution that provides the capability to update the firmware of all devices should be capable of performing the complete activity including making decisions based on results received, providing updates on progress, raising incidents when required, and updating inventory repositories on completion.  

IBM Rapid Network Automation provides you with a simple-to-use, fully scalable, secure platform from which to create, manage and deploy these solutions across your organisation. 

0 comments
10 views

Permalink