The main reason why I like my job is because I can manage to have an impact (good impact) in someone’s business/corporate life. One of these situations where I managed to help one of our customers to sleep better during nights is when we managed to solve one big problem to one of my preferred customers working in a financial services institution in Asia.
This company was (and is) a user of SevOne and they were using it to monitor all their IT core estate, devices such as switches, routers, load balancers, firewalls, Wifi, etc. Among all these devices, the load balancers (specifically F5 devices) were the ones giving them trouble.

Example of F5 report in Data Insight
The problem
The problem they were facing is that, even though they had thousands of alerts per day trigger by their F5 devices, they couldn’t to find the source of the issue. They were having daily outages, his manager was putting a lot of pressure on our main contact in that company, and he got to a point that didn’t know what to do, he was very close to resign because of this problem.
During a call reviewing some new functionality of the platform, the customer told us about this problem, thinking we wouldn’t be able to help (you don’t know what you don’t know) but fortunately after one week we managed to sort the situation.
First thing we did was to have a platform assessment, basically a review of how the platform is being used, and, among lots of other configurations that could be improved, the main issue we found with the F5 devices was the object rules.

Example of objects per device
Object rules are a setting that allows you to choose what objects/metrics will be monitored in SevOne. This is a very cool feature, however if it’s not used properly, it can generate nightmares such as the one that the customer had. The issue with the object rules was that there were too many and too restrictive, we were limiting a lot the metrics collected from the F5 devices.
The reason why they were doing this is because they didn’t have enough licenses to monitor all the objects, so they started being very restrictive and only limit well known objects such as CPUs, memory and a few interfaces.
Based on my experience I know that in devices such as F5 (but also applies to firewalls, Wifi, SDN, SDWAN, …) there is a lot of valuable information outside the typical CPU, memory and interfaces. And if you don’t monitor those metrics, you will face issues like the ones my friend was having.
The solution
My solution was simple, why don’t you migrate your license from object to device based? This way you will not have a limit on number of objects monitored, but based on the number of devices monitored, meaning that you can monitor as many objects per device as you like.
The customer agreed to do a proof of concept to see if monitoring more objects would help and two days later, we managed to find the problem.
The main issue was that one virtual server was getting lots of connections during the morning rush, the TMM memory was growing also during that time until reaching 100%. This meant that the load balancing algorithm started failing and in some of the services all the connections were steered to one single server, hence the daily downtime.

TMM Issue
If we didn’t have the visibility we needed of the network, and in this case of the F5 devices, we would have never managed to find the issue, and the customer would have probable lost his job.
And with device-based licensing, the company did not only get unlimited objects per device, they also got Data Insight (an amazing reporting engine), unlimited SevOne appliances, including high availability servers, all the netflow data they want to consume and the integration of Kafka and Pulsar buses. All included in the new licensing model.
Go to extra mile
It was great to find what the problem was, but the next question is, how can we fix it? And how can we fix it automatically?
We knew what the problem was, we knew how to solve the problem, but we wanted to automate it. The issue with automation is that our main contact in the company wasn’t really a developer or a scripter, he had played with Python in the past, but he was not able to create a good script to automate everything he wanted to do. And on top of it, he didn’t have the time to learn how to configure F5 using API commands, he didn’t even know how F5 API works!
Good luck for him, once they transitioned to device-based license, they were entitled to tradeup their SevOne platform to SevOne Automated Network Observability (SANO) that includes the no code automation module called RNA (Rapid Network Automation).

No Code Automation Workflow
With this module we managed to create a workflow that would be executed automatically once SevOne detected the problem, and we created this workflow without any scripts, without knowing how F5 API works, not even how authentication works! And all of this in just a couple of hours of try and error.
After this exercise, the customer took five minutes of his time to update his LinkedIn profile to add ‘automation’ as part of his skills 😊