View Only

We can build self-driving cars, but what about self-driving operations?

By Isabell Sippli posted Wed May 18, 2022 09:11 AM

Acknowledgements: Special thanks to @Rama Akkiraju and @PATRICK BUTLER for their review and insights. 

The media continues to be awash with information, and different perspectives on, the state-of-the-art and future of autonomous vehicles. Perform a web search for “self-driving cars” and you will find a heady mix of material from technical, to legislative, to ethics, to marketing hype, to outright cynicism. 
It's unsurprising that such a potentially disruptive technology generates this level of attention, debate and expectation. Such excitement is in no small part responsible for the fact that both of us, deeply embedded in the world of IT operations, have often been asked the question:
If computers can drive cars, then why can't computers run IT Operations?
And the answer is … not simple
To frame what the above question actually means, and the assumptions implicit within it, let's take a recent  announcement by Mercedes-Benz of the "First internationally valid system approval for conditionally automated driving"  making it  "the first automotive company in the world to meet the demanding legal requirements of UN-R157 for a Level 3 system.”
Assuming that this claimed first is representative of the state-of-the-art, we need to understand what is meant by "conditionally automated driving" and "Level 3 system".  
Then, perhaps, some more illuminating questions might be:
  1. To what degree and in which circumstances can cars drive themselves?
  2. How do the problems of self-driving cars and autonomous IT Operations compare?
  3. How do techniques for solutions to these problems compare?
  4. To what degree and in which circumstances can IT Operations be automated?
Those are the questions we would like to explore here.

Defining “autonomous” using the the MAPE loop
The Merriam Webster dictionary defines “autonomous” as "existing or acting separately from other things or people".  
Automated and semi-automated Control Systems manage, command, direct, or regulate the behavior of other devices or systems using control loops,  the fundamental building blocks of industrial control systems
We consider the MAPE loop to be a conveniently abstract model to frame the requisite comparisons between driving a vehicle and operating an IT environment.
A MAPE loop models a system which:
  • (M)onitors it's environment
  • (A)nalyses the signals
  • Makes a (P)lan on the next step(s)
  • (E)xecutes the plan

Showing Monitor-Analyze-Plan-Execute loop

The last step in the loop –execution–  will often influence the environment, and hence the process continues in a loop wherein the next state of the system serves as input to the next "Monitor" step. 
A system that performs all steps autonomously itself is autonomous. 
Of course, for such systems to be useful they will be subject to known constraints or objectives. For example, a domestic central heating control system's objective may be specified as a target temperature on a dial or control panel. It will:
  • Monitor the environment for current temperature(s), e.g. via a thermostat
  • Analyse the signal against the target temperature, e.g. determine its distance from the target temperature; determine its gradient
  • Plan the change, if any, required to the heating system, e.g. determine the required change in the temperature or flow of a thermal agent
  • Execute the planned change, e.g. actuate a hot water valve, activate or deactivate a heating circuit
Car + autonomous control = ?
A self-driving car, also known as an autonomous vehicle [...], is a vehicle that is capable of sensing its environment and moving safely with little or no human input. 
Mapping the above definition to the MAPE loop, we might consider the objectives to be "Arrive at a predetermined destination safely and legally".  The MAPE steps are therefor:
The vehicle
  • uses GPS and map data to determine its location
  • uses vehicular sensors to determine its velocity and the state of its control functions
  • uses environmental sensors to detect its immediate surroundings, e.g. cameras, radar, lidar
The vehicle
  • determines its direction and orientation against its programmed destination and a calculated route
  • builds a model of the immediate surroundings to represent significant entities, e.g. the road boundaries, lanes, moving or stationary objects and their classification, road signs
The vehicle
  • determines the next change (if any) required to its control functions in order to:
    • Progress along the calculated route
    • Anticipate and avoid objects in the immediate vicinity
    • Proceed legally
The vehicle actuates its control functions: accelerate, brakes or steers the car
Note that those are essentially 3 potential actions. Spoiler: This is very different to the execute options in operations.

The level of autonomy, as referenced by Mercedes-Benz, is defined in a standard, developed by the Society for Automotive Engineers (SAE). AI specialist Lance Eliot provides an explanation of each level:
  • Level 0: No [Driving] automation, human driver required to operate at all times, human driver in full control
  • Level 1: Driver assistance, automation adds layer of safety and comfort in very function-specific manner, human driver required for all critical functions, human driver in control
  • Level 2: Partial [Driving] automation, automation does some autonomous functions of two or more tasks, such as adaptive cruise control and automated lane changing, human driver in control
  • Level 3: Conditional [Driving]  automation, automation undertakes various safety-critical driving functions in particular roadway conditions, human driver in partial control
  • Level 4: High [Driving] automation, automation performs all aspects of the dynamic driving task but only in defined use cases and under certain circumstances such as say snow or foul weather gives control back to human, human driver in partial control
  • Level 5: Full [Driving] automation, automation performs all aspects of the dynamic driving task in all roadway and environmental conditions, no human driver required or needed.

So what level are we currently on with self driving cars?
Besides the official confirmation that we are on level 3, we're probably approaching level 4: Waymo started an autonomous taxi service in Phoenix in October 2020. While this is certainly an impressive achievement, it does not qualify as full level 5. Phoenix is known for its wide streets, very few pedestrians, and also rain and snow are very rare there. 
In February 2022, Cruise started to offer a driver-less taxi service in San Francisco, which is certainly more challenging than Phoenix. 
In March 2022, Waymo announced it would be taking its service to San Francisco as well.
Those developments align with predictions several scientists have made in HBR: "The most likely near-term scenario we'll see are various forms of spatial segregation: Self-driving cars will operate in some areas and not others."
Now, let's have a look at where we are with self-driving operations.
What is self driving operations?
Searching this term, you will not find many definitions. We are therefore proposing the following:
Self-driving operations, also known as an autonomous operations is an IT operations environment that is capable of monitoring its entirety of managed services and keeping them to meet SLOs with little or no human input. (Kristian Stewart/Isabell Sippli)

Applying the MAPE loop, self-driving operations:
Monitor the applications, services, infrastructure, with different levels of quality (Availability, performance,  end user behaviour, etc.)
Analyse the data, to primarily decide whether it should take corrective actions.
Make a Plan, e.g. to prevent an outage/incident, OR raise a situation to the end user's attention.
Execute the action, e.g. roll back a change, or open a ticket
And here is our attempt at adjusting the SAE level of autonomy to operations. 
For now, we'll start with a simple summary:
  • Level 0: No Operational Automation: 
    • Human operators in charge of everything. Basic Monitoring in place
  • Level 1: Operational Assistance/Level 2: Partial Operational Automation
    • Operational automation adds layer of comfort in a function specific manner, by e.g. central event collection, rule-based event correlation or documented remediation actions. 
    •  Human operators required for all critical decisions and actions.
  • Level 3: Conditional Operational Automation
    • Operational automation in control of most detection and alerting, including correlation, e.g. through AI based event correlation. Human operators required for most critical decisions and actions, e.g. selecting and executing remediation actions.
  • Level 4: High Operational Automation
    • Operational automation performs all daily operational tasks, like issue detection, root cause recommendation and selection of remediation actions. Human operators mainly in charge of reviewing and execution remediation actions, or troubleshooting complex issues.
  • Level 5: Full Operational Automation
    • Self driving operations, including root cause detection and automated, self-learning action execution. Human operators not needed anymore. This does not include automatically fixing defects in code, as fixing software bugs automatically and in call cases appears to require artificial general intelligence, which is currently not on the near horizon (source)
(This is, of course, a simplification. We will share the background of how we arrived at the levels in a future post.)
One could further constrain the levels by the environments that are being operated. Depending on that, one can think of examples of operations teams on Level 3, or even Level 4. 
In general, the following rules apply:
  • The more homogeneous and highly controlled the environment, the better the chances for reaching higher levels, as we can map the environment for structure and dependencies more easily, and the execution options are more limited. The less variation in the environment, the easier it is to react and establish common solutions. Also, a highly controlled environments has very little ad-hoc human intervention, which simplifies standard approaches towards managing it.
  • The more observable and programmatically controllable application software is, the higher the level we can accomplish.
The rise and advancements of Kubernetes platforms, e.g. RedHat OpenShift, as well as public cloud technologies like IBM Code Engine certainly can, under certain circumstances, help in adhering to such rules, as they enforce a certain level of homogeneity. RedHat has even introduced maturity level for its OperatorHub allowing solutions like Couchbase or DataStax to provide auto-pilot capabilities to their exploiters.
However, those are distinct examples, within limited conditions, which is strongly related to self-driving cars.

Comparing autonomous vehicles and autonomous operations
When comparing autonomous vehicles and autonomous driving, it becomes apparent that even though the general principles can be reduced to a MAPE loop and are alike, the individual challenges differ highly.
The below table shows selected aspects only for illustrative and comparative purposes, and is not a complete list.
How is it done (Autonomous Vehicles)    
Challenges  (AV)  (see this NYT article for most of the challenges) 
How is it done (Autonomous Operations)
Challenges (AO)    
Sensor Capture & Sensor Fusion, e.g. through 
LiDAR (1) (Light Detection and Ranging, see here )
  • Bad Weather, or  "Where Did the Lines on the Road Go?" 
  • Potholes, or  "It Might Be a Puddle. Or Not."
  • Polling
  • Observing/Introspection
  • Listening     
  • Heterogeneity of environments
  • Data volumes
Creating a virtual world model (e.g. through 3D Model)
System Action Plan - determine what to do next, bound to what the car can actually do
see here
  • Understanding the objects around the car, and the ones approaching (building a 3D Model)
  • Understanding current point in journey towards destination
  • Ethics on the Road, ie selecting the best choice out of multiple bad choices  
  • Digital Mapping, ie factoring in detours and Rerouted Roads 
  • Discovery
  • Statistical Analysis
  • AI models for  anomaly detection, signal correlation, root cause detection          
Build topology and AI Models and infuse domain models to build sth that is representative of the state and structure of the environment, taking heterogeneity into account
Activating controls, generally limited to options:
  • Turn left (+degree)
  • Turn right (+degree)
  • Accelerate (how much)
  • Decelerate (how much)
  • Reckless Drivers, ie Unpredictable Humans 
A practically infinite set of options, some examples: 
  • Changes to the hardware infrastructure (servers, network equipment, e.g. replace, patch, extend)
  • Changes to the software infrastructure (operating system upgrades, security patches)
  • Changes to the applications themselves (patch, restart, rollback ... )
  • Determining the general options, out of practically infinite set
  • Selecting the best option
Beyond the MAPE loop, there are also several other differences that we would like to point out here :
  1. Location of the intelligence: In self-driving cars, all intelligence is in the car - the roads do not contain yet any intelligence. In operations, the intelligence/AI is in both the applications/services (the car), and the environment including the management solutions (the roads) they operate in.
  2. Timing: Autonomous driving decisions have to be split-second decisions on the edge, i.e. in the car. A car does not have access to large computing resources. IT Operations usually have access to more compute power, and hence can afford to potentially run more CPU/memory intense operations.
  3. Points of failure: In an automobile situation, points of failure are a (relatively small) finite set- the physical parts of a vehicle and the interaction points with objects/people on the road. In IT, with ever-growing micro-services and new and old workers/developers/ITOps folks coming and going with different skills, there are many more points of failure which can grow to be arbitrarily large
  4. Regulatory controls – those differ depending on the industry.

Based on the different levels of complexity and the fact that the execution options in autonomous operations are much higher than in autonomous vehicles, we believe we cannot imply that having self-driving cars also means we should be able to achieve self-driving operations. 
Based on what we've learned so far, can we really build self-driving cars today? The answer is - it depends. And this really maps to the answer of us being able to build self-driving operations.
Some of the early predictions (see [1] ) on self-driving cars were quite wrong. 
To the question When will we arrive at self-driving operations? 
Our honest answer is: We don't know.
However, we do know there are true advancements in several areas like AIOps, that will help you adding more autonomy to your operations. 
The IBM Cloud Pak for Watson AIOps ( is one option we are excited to be shaping.

o  "From 2020, you will be a permanent backseat driver." Guardian 2015
o  "10 million self-driving cars will be on the road by 2020" Business Insider  2016