In the last blog post, I described at high-level the ultimate AIOps maturity state that IT Operations may eventually want to get to.
In this blog post, I will primarily describe the three main IT Operations types. In future posts, I will describe the gradual and iterative steps that IT software solution owners can take to reach the fully AIOps infused production environment state.
First, let us look at what AIOps is. As described by Gartner, here, “an AIOps platform combines big data and machine learning functionality to support all primary IT operations functions through the scalable ingestion and analysis of the ever-increasing volume, variety and velocity of data generated by IT.”
As most IT professionals agree on, if there is one constant in the IT world, that would be the constant change driven by business requirements, ambitions, initiatives, stiff competition, and market opportunities, just to name a few. IT executives want to be able to support all these drivers while keeping the following ideal goal in mind:
Software will fail. However, when the failure happens, the recovery from failure must happen in such a way that the business is not impacted in a negative way. An example of a negative impact might be a customer dissatisfaction, financial or reputation cost. An IT production environment outage can be the worst kind of failure from a business perspective. Consider a production environment that supports millions of external users that purchase products worth billions of dollars. An outage that happens in such an environment will be very costly.
Note that IT executives certainly have other goals related to production change delivery speed, production support costs, etc. But, I am focusing on the goal of recovery from a production failure to make the following production failure related points.
- For a reactive IT Operations, the following key points are considered from a business perspective:
-
- What value does each transaction supported by the environment have?
- How long will a failure for each specific transaction be allowed to last?
- What recovery mechanism must be in place to handle each specific failure?
- For a predictive IT Operations, the following key points are considered from a business perspective:
- What value does each transaction supported by the environment have?
- What measures must we take to predict failures?
- How long will a failure for each specific transaction be allowed to last?
- What recovery mechanism must be in place to handle each specific failure?
- For a proactive IT Operations, the following key points are considered from a business perspective:
-
- What value does each transaction supported by the environment have?
- What measures must we take to proactively avoid or minimize failures?
As can be seen, all types of IT Operations have the following in common:
- They all consider the value of business transactions the production environment supports.
- They all address failures
What differentiates one IT Operation type from another is how they address failures. The first level of the IT Operations maturity is the reactive IT Operations, the second level is the predictive IT Operations, and the third level is the proactive IT Operations. But, going from one IT Operations maturity level to the next requires a change along the four dimensions: culture, tools, process and data.