Costs are an important piece of all company strategies and can define the success of a project or also for an entire company. High performance is important but at what cost?
The most important objective of deploying Observability solutions is to improve the reliability, performance, and availability of the applications, but how can we guarantee we are not overutilizing the infrastructure to deliver good performance?
We can find the root cause of the problems using Observability, but how can we identify exactly which infrastructure component we need to expand to improve the application performance? And how can we automate the infrastructure changes to do it?
This example provides an idea of how to pursue performance with efficiency at the same time, contributing to the teams and companies on how to deliver solutions not covering only performance but also using the IT infrastructure efficiently.
To exemplify it we need an application performance management tool (APM) and an Application Resource Management tool (ARM). I will use two IBM solutions to demonstrate it, IBM Instana as the APM solution and IBM Turbonomic as the ARM solution.
The APM solution will identify the bottlenecks, and performance issues, define the SLO, and so on. And the ARM will identify the infrastructure utilization, comparison of Cloud vendor prices, and so on.
The image below shows the application performance using the APM tool, and, as you can see, the application is running well, and the latency, error rate, and traffic are pretty good.
But let’s go deeper to see if the SLO is being satisfied (that is why is really important to set the SLOs)
As you can see the SLOs are also good but is important to emphasize that we had a peak on the graphs and we can identify the problem.
Navigating into tracing details using the APM tool, we can identify exactly when it happened:
And going deeper into the specific transaction we can see where is the problem:
Using the APM tool we identified the performance problem, which is a query running on a specific host. Is it a resource problem (saturation)? Not sure yet.
Now, let’s go to the other part of this demonstration, analyzing how the ARM tool will help us.
The ARM solution here will help us to identify two things, the efficiency of the IT resources used by the application and the possibility to size up the IT resources to improve the application performance.
First of all, it is good to show how Turbonomic abstracts the Instana data, in other words, how Turbonomic is associating the Instana entities with the ARM solution.
As you can see on the image, Application Perspective (Instana entity) is associated as Business Application, Endpoints (Instana entity) are associated as Business Transactions, Service (Instana Entity) is associated as service, and processes (Instana entity) are associated as application component:
Let’s see the same application now on the ARM tool. As you can see on the image below, at the left we have the topology created by the tool, since the data centers and infrastructure components through the Business Application, showing in yellow, green, and red the status of the components in terms of efficiency and performance. On the right we can see for example some pending actions:
Clicking on the pending actions we will see the complete list of possible improvements on the infrastructure (29 actions), where those actions will help to save 129Gb of vmem and 20 vCPUs just for this application (imagine the entire company), accepting the actions it can be automatically changed and we will have an application with the same performance but utilizing less IT resources.
Related to the performance improvements, the ARM tool also recommended some actions as we can see in the image below. The ARM tool can get the SLO from the APM solution to compare and check when the SLO is violated.
Hope this article can help you to define Observability strategies, not only concerning performance but also on efficiency, allowing you to deliver high-performance and efficient solutions.