Instana

 View Only

Mean Time to Prevention

By Thomas Fisher posted Sat April 30, 2022 12:00 AM

  

The content on this page was originally published on Instana.com and has been migrated to the community as a historical asset. As such, it may contain outdated information on our products and features. Please comment if you have questions about the content. 

What It Means and Why It’s Important … to You

MTTP – Mean Time To Prevention describes how long it takes Observability+AIOps to automatically prevent issues from negatively impacting application performance.

It’s a new term defined by Instana and IBM. It’s now added to the lineup of Mean Time To (MTTx) terms to describe AIOps automated remediation.

As microservices architectures become more granular, automated remediation will take on a much more significant role. Microservices and their dependencies are already 10-100x more distributed and have smaller code footprints than SOA-based applications. This trend will likely continue until a minimum functional microservice code footprint is reached, and microservices won’t benefit from further minimization.

The goal of a minimum microservice code footprint (lean code) is to reduce the amount of code required to implement specific functions. The maximum extent of minimum functional code will be when it turns into an anti-pattern. Rapid issue detection is critical for service reliability with this highly expanded microservice distribution.

One of the lean code benefits is that it reduces code debugging time and potential problems inherent in each code block. The MTTP prevention goal is to reduce the amount of time it takes to invoke a problem resolution for issues that can be resolved with automation.

There are problems that are not code-related, such as resource (CPU, memory, bandwidth, etc.) under-allocation or other factors affecting the application. They’re the sort of problems that can be solved through automatic remediation without requiring human intervention.

Why? Because the time it takes for human intervention to adjust elements, such as infrastructure resources, is way too long.  Microservices scale and retract, especially rapidly in the cloud, and can only be managed effectively with automation.

The MTTP goal is to combine precise 1-second observability with AIOps to immediately detect and prevent problems from occurring. Observability platforms that take 10 seconds or longer are too slow for rapid issue detection and are a limiting factor for issue prevention. That’s because prevention can’t occur until after the issue is detected.

AIOps prevention automation will reduce the number of issues that will have to be addressed by DevOps and SRE teams. It’s the best way to achieve hyper-resiliency cloud-native and hybrid cloud applications.

Without automation, every application issue, from complicated to trivial, requires human intervention to resolve. Continued manual interference is unsustainable as applications become even more highly distributed with microservices, containers and endpoint scaling.

Certainly, code issues will continue to require human triage to repair. But, more rudimentary issues, such as resource allocation or de-allocations, can be effectively handled using AI-driven Automated Resource Management (ARM).

In IBM’s portfolio, that capability is provided by Turbonomic AIOps, which relies upon Instana’s precision metrics to rapidly provide ARM adjustments.

This combination facilitates AI-Driven Observability, for which MTTP is a new and significant key AIOps measurement for now and into the future. The original goal of Observability was to continually reduce software and infrastructure issue remediation time to the minimum with unabridged visibility and AI-enhanced context.

Instana’s go-forward strategy is to continue providing the most rapid MTTR capabilities available and mitigate the need for manual software and infrastructure issue remediation wherever possible. That means AI-Driven automation, such as Turbonomic and Watson AIOps, will be used to prevent as many issues as possible from ever impacting the application or infrastructure.

In some cases, issue remediation will be completely handled by Observability, ARM and AIOps. MTTP is the metric that serves as the measurement for how long it takes for automated remediation to prevent application and infrastructure issues.

At other times, runbook procedures enable semi-automatic remediation.  Then Instana’s 3-second Mean Time to Notification (MTTN) will continue to provide the most rapid “what to do next” runbook remediation.

This is the AIOps progression that has been in the works for a while.  AI-driven AIOps is the successor to AI-enhanced capabilities such as Smart Alerts, which use AI to intelligently filter all of the alerts generated by applications and infrastructure. Now, after receiving a Smart Alert, AIOps can automatically remediate and prevent problems that the machine can respond to.

Manual triage will always be with us. Some problems are just too complex for AI to automatically remediate at this time. But automated AIOps can and will reduce the remediation workload for highly paid and/or overworked software technical staff.

MTTP is an increasingly important attribute to know for software and infrastructure triage going into the future. It will pave the way for much better SLI results, which will be instrumental to achieving SLO goals. It will also become a critical SRE measurement to indicate how fast issues are being resolved and how it has helped improve SLI/SLOs.

Start your MTTP journey today by checking out Instana’s industry-leading 1-second metrics Observability combined with Turbonomic AIOps Automated Resource Management. Together, they can ostensibly prevent application problems before other Observability platforms even detect them.

Try out with a guided tour in our Play With environment.

0 comments
7 views

Permalink