Instana U

 View Only

Tech Preview: Take Action from within Instana using our Action Framework

By Jeff Hamilton posted Tue December 13, 2022 12:00 AM

  

In previous posts, we have talked about Instana looking to go beyond pure observability and help customers reduce MTTR for incidents that it detects (see Instana Wants To Help You Resolve Issues – Faster!, Automated Remediation: Fail Fast, Fix Fast).  Instana’s strength has always been capturing metrics at a one-second granularity and alerting in real-time but imagine if it could also help diagnose and fix the problems that it finds…

We are excited to announce a tech preview of some work in this area and our new Instana action framework.

For those who attended AWS re:Invent this year, you may have seen our joint demo with the team from PagerDuty, where we invoked PagerDuty Process Automation directly from within Instana to resolve an incident affecting Instana itself.  In other words, we used Instana monitoring Instana in production to take action via PagerDuty to fix Instana.  Pretty cool!

Instana Action Framework

The basic idea is that we provide the concept of an action catalog within Instana.  Customers can create new actions or reuse their existing automations.  Integration can be via runbook, webhook, scripts, or other 3rd party action providers (eg. PagerDuty Process Automation, Ansible, etc.).  These actions can then be associated with various Instana events and will be visible to each event occurrence as potential actions to run.  Actions can then be run via an action sensor on the Instana agent, and action output will be collected and made visible without ever leaving Instana.

Why do I need an action framework?

Some customers already have external scripts or automations that they use to diagnose or fix problems. So how does an action framework within Instana help?

Take action from within Instana (manual or automatic) – By associating actions to events, people investigating can hit the ground running and see what actions have been used in the past to work around or fix similar issues. This can save time in determining next steps and even run actions directly on the agent that triggered the event

Learn over time – Don’t lose the history!

  • Have we seen a similar event before?
  • What did we do last time? Who did it?
  • Is there related doc or best practices about fixing this problem?

The Power of AI – AI can also be a huge help here not only in identifying similar problems and their solutions, but also in recommending fixes to new problems if the symptoms or root cause are similar. Use of the action framework sets the stage for Instana being able to recommend a next action to take to resolve issues.

How can AI help? Recommend or act

  • Recurring problems –Been there solved that. Utilize history and success of past actions, automatically run a diagnostic, recommend a next action
  • New problems – Leverage Instana event context and history to look for similarities and recommend action or documentation to help.

Enable automation – The action framework also paves the way for time saving automation.

  • Diagnostic – You may want to automatically trigger a diagnostic once a particular event occurs and have that output ready for investigation.
  • Must gather – Let’s automatically run an action that gathers logs, diagnostic output, or other information to speed root cause analysis
  • Mitigate – Let’s automatically run an action that works around the issue until it can be fully resolved (eg. Add memory, increase cache, etc.).
  • Remediate – If there is a known fix for this issue then we may choose to automatically run it. Problem solved.

Watch the Instana and PagerDuty Process Automation Integration Overview video

If you would be interested in seeing a demo of the tech preview of our new action framework or would like to try it out for yourself then please contact jeffh@ca.ibm.com for more information.

Permalink