“Is it the app or the network?” - IBM full stack observability and event management
Everybody in IT is talking about observability, why it is so crucial, what you can achieve with and more importantly how it differs from simple monitoring. Observability, when done right, is a foundation of service performance management, at every level, from applications to servers to network.
In this blogpost, we aim to explain why and how network performance management and application performance management systems can act together to build an enterprise observability solution. But there is more than just bringing two separate tools together to evaluate the performance of applications, systems, and networks. Enterprise observability, also known as full-stack observability, is an important data provider to an event and incident management solution based on AI tools. AIOps, as the name implies, using AI algorithms for IT operations, is extending network and application observability to a next level:
- to group correlating events from independent senders,
- to detect performance (metric) anomalies,
- to show dependencies between apps, systems, and networks in a topology,
- to identify an extended blast radius of apps and networks,
- to highlight the root cause, either graphically or textually,
- to point to an easy-to-understand resolution actions.
In today's highly connected hybrid multicloud data centers, the backbone of successful businesses rely on network infrastructure functioning as the technological highway for essential applications. To ensure an optimal user experience, applications and networks must provide consistent service, reliable access, and continuous performance. Together, IBM SevOne, for network observability, along with IBM Instana for application observability offer detailed application-centric insights to facilitate quicker identification and resolution of issues.. Both product sets can be integrated with the IBM Cloud Pak for AIOps via out-of-the-box connection technologies (webhook) and generic REST APIs.
SevOne can be configured to send issue and incident information as events to Cloud Pak for AIOPs, and AIOPs can be configured to request topology information from SevOne via a topology observer. Instana can be configured to send issue and incident information as events and hundreds of performance KPIs as metric data to AIOPs whereas AIOPs can be configured to request topology information from Instana via a topology oberver.
SevOne and Instana events can be forwarded by using json value pairs to an AIOPs webhook, which is able to receive variables and the variables will be inserted to text alerts, so that the SRE can easily detect a problematic device or failing object immediately. In many cases, these variables already can contain the root cause for an issue, like a failing adapter or defect port and AIOPs can use them to highlight the probable cause for an outage. The goal of AIOPs here is to combine and further analyze all available information about an issue, which can come from not only one but various sources, called senders.
In this blog we evaluate a common issue and use case, covering an application performance problem due to a network adapter failure that we replicated in an IBM lab. A typical use case in this area can be the connectivity between an application server and a database, which run on different VMs or containers. Imagine, there is a fast network and a slower network connection between both components, app server and database. If the fast network fails, and the slow network has to take over, the application might suffer from network performance problems due to high transaction rates to the database.
SREs want to detect this kind of issue very quickly and we will show how IBM Cloud Pak for AIOps is able to correlate multiple issues from different senders to an incident and highlight the root cause of the issue in the alert console and topology view.