In this article, I describe the first-mile and last-mile problems of machine learning models, give examples of their occurrence in a few domains, and suggest practical strategies for dealing with these problems from my experience.
The first-mile and last-mile problem – an everyday example
“Head north-east on Kirwin Lane and then turn left on to De Anza Blvd” suggested the mapping software on my smartphone, but I had no idea which direction was north-east. I had to guess by the direction of the sun. Thank goodness it wasn’t a cloudy day!
Finally, after 20 minutes of driving, the map lady announced that I had arrived at my destination – right in the middle of a busy three-lane road near an intersection!
Where had I arrived? Where was that antique store I was looking for? To my right was a large shopping complex. I guessed the store I was looking for was in there somewhere. But I was not in the correct lane to make a turn. I had to miss the turn, go further, make some roundabouts to come back. Once in the complex, the map lady announced, “go north” and then immediately suggested, “go south”. Ugh! That was frustrating!
At this point, I turned the mapping software on my smartphone off. It was time to park the car somewhere in the parking lot and walk around or ask the nearby storekeeper for directions.
My experience that day was a classic example of the first-mile and last-mile problems with mapping software. Delivery companies lose billions of dollars annually due to last-mile delivery problems because the mapping software can’t finish the task of taking drivers all the way to their destinations.
Statistical Machine Learning algorithms suffer from the same first-mile and last-mile problems.
The first-mile and the last-mile problems with machine learning models
The use of statistical machine learning algorithms in real-world business applications is on the rise. From chatbots in customer care domains, doctor’s assistants in medical domains, attorney’s assistants in legal domains, face recognition in security domains, to social media monitoring in retail, media, and entertainment domains, IT operations management in IT domains, decision support in insurance and law enforcement domains and more, AI models powered by statistical machine learning are making their way into many business-to-business scenarios.
It is well-known that most machine learning algorithms need a lot of representative data to learn patterns in that data to make predictions. Until such representative data is made available to the machine learning algorithms, their prediction accuracy may not be accurate enough for many use cases. I call this phenomenon the first-mile problem because the system is not able to get off the ground with the desired accuracy in its prediction – in the same way, mapping software doesn’t quite know how to guide a driver to get out of an unmapped neighborhood to the nearest well-known street that is mapped by the mapping software.
Suppose we get past the first-mile problem by providing enough representative data to train a machine learning model and that it is making predictions well-enough to put it to action in specific use cases in production, soon enough, you will realize that while the model performs well for 80-85% of the cases, it starts to falter in the remaining 15-20% of the cases. These are typically corner cases where it is not practically feasible to get enough representative samples during training as by definition they occur rarely. Yet these corner cases do occur, and the model needs to be able to deal with them. This is the last mile problem. This is akin to the mapping software declaring that you have arrived at the destination – even though, as in my case, I was still in the middle of a three-lane road.
Log Anomaly Detection: An illustrative example
Let’s examine these first-mile and last-mile problems in the context of log anomaly detection, a machine learning model used to detect anomalies from IT applications and system logs as part of IT operations management.
An anomaly is something that deviates from normal, standard, or expected behavior. The goal of log anomaly detection is to detect anomalies from IT applications and system logs in real-time. These may include logs written by an application, infrastructure, network device, operating system, middleware, and everything in between. Typically, organizations set either static thresholds or manual rules to define and manage deviations from normal behavior. The problem with static thresholds is that it takes a long time for subject matter experts (SME) to distill them from their experience and to create them. Moreover, these static thresholds don’t easily adapt to changes and, therefore, tend to get outdated and become irrelevant quickly. Therefore, it is better to use machine learning models to detect anomalies from logs.
Machine learning models are good at learning patterns. When faced with an anomalous pattern of log messages that do not conform to the normal pattern that has been learned, a machine learning model can raise an anomaly. This relieves organizations of the need to create and manage static thresholds or to rely on SMEs to write rules for every possible anomalous condition, which might be hard to do.
Many techniques have been implemented for log anomaly detection (after converting the unstructured logs into structured features via log parsing) such as ARIMA, Seasonal ARIMA, XGBoost, Exponential Smoothing, Principal Component Analysis (PCA), and other deep-learning algorithms like LSTM. Many of these techniques still require ‘normal’ data to learn the patterns from. The challenge is in collecting representative normal data without human intervention in a reasonable amount of time. Not all IT environments are guaranteed to produce representative data in the first few minutes of turning an algorithm on. Till the model sees enough variations, seasonality, and other patterns, the model’s baseline is not stable. Predictions made during that time tend to be not quite accurate – like how the mapping software keeps saying ‘go north … go south’ almost at the same time. Essentially, the model is still adjusting and getting its bearings and establishing a baseline during this time. This is the first-mile problem. When faced with this type of first-mile problem, it is best to enable continuous learning mechanisms so that the model can learn fast with customer data, in a customer environment. Below I share a few other strategies to better deal with the first-mile problem with machine learning models.
Strategies for dealing with the first-mile problem
- Build a broad-based base model: Whenever possible, try to build a good base model with as much representative data as can be obtained. These can be considered as ‘base models’. Accuracy of these services usually ranges from, say, 75%-85% accuracy (+/- 5%-10%). Enriching these base models is a continuous activity involving collecting, cleansing, training, and fine-tuning the model. For example, in the case of log anomaly detection, a base model can be built with a week’s or month’s worth of historic log data. This model can get you 75-85% of the way. Often, even this historic data fails to capture the variations triggered by user loads, seasonality, and other factors well-enough to learn the patterns reliably. That’s why the ability to customize these base models and continuously improve them becomes critical to achieving the desired levels of accuracy.
- Enable Model Customization: Model customization is needed either when good-enough base models cannot be built ahead due to the special nature of the data (e.g. anomaly prediction in IT system logs for proprietary applications) or when the base model that is built using general-purpose data does not scale well for company-specific environments (e.g. general purpose Chatbots may not scale well for special drive-in menu order taking chatbot). By exposing the APIs for the machine learning model to be customized, you make it easy for the model to take-in external data beyond the data with which it has been trained initially for on-the-field training. Model customization is the mechanism by which continuous learning happens.
- Enable Continuous Learning: Enabling hooks for continuous learning is a must for any machine learning model for multiple reasons, either because the initial training data is insufficient and must be augmented or customized or because the model needs to stay fresh to reflect the changing input data patterns or for other reasons. Necessary mechanisms for automatically retraining the models with new data is a critical aspect of deploying machine learning models, especially to address the first-mile problem.
- Human AI-authoring & Feedback: Enable subject matter experts (SMEs) to guide various aspects of prediction tasks including data selection, data preparation, annotations of unknown patterns/templates/samples to expedite learning. This is, in essence, the human authoring of AI. There is no shame in doing this and in fact, is the best way to bootstrap the models and get them going in the right direction. It’s like asking a local person for directions to get on the nearest known main street when you get lost. It works! After all, you want the job to get done rather than sitting on a high tower of full automation. SMEs can further accelerate learning by giving regular feedback to the models using which the models can learn continuously.
- Have realistic expectations: Machine learning models can’t do magic. Having realistic expectations goes a long way in avoiding early and premature disappointments with technology that can improve over time with the right feedback and more representative data. After all, often, you know the way to the nearest street that is mapped on the map software. So, if the map software gets it wrong, you use your in-built sense of orientation to get going till you can have the map software guide you better. That is, treat the initial model to be an intern-in-training until enough representative data can be collected to improve the accuracy of the model.
Continuing the log anomaly prediction problem as an example to illustrate first-mile and last-mile problems with machine learning models, let’s examine how the last-mile problem manifests. Say that you have built a good anomaly prediction model, did all the right things to get past the first-mile problem, achieved desired prediction accuracy, and deployed the model in production. Even so, the model might make mistakes every so often because of the infrequently occurring long-tail type of scenarios. For example, seasonality that occurs once in a year, or maintenance periods that trigger different behaviors of IT systems may confuse the model and may lead to inaccurate predictions. These are examples of the last-mile problem. In such cases, the best course of action might be to write a rule to deal with seasonality and maintenance windows. It takes too long, too much data, and too many repetitions of data for the model to learn these types of patterns. It is much easier to deal with these types of patterns via rules. Here are a few other strategies to better deal with the last-mile problem with machine learning models.
Strategies for dealing with the last-mile problem
- Develop good test datasets with accurate ground truth: How does one know that there is a last-mile problem or model accuracy prediction problem? Well, one must develop good test datasets with accurate ground truth to first identify that there is a prediction problem. In some domains, there could be a lot of gray areas in the ground truth. What might be an anomaly for one may not be one for the other. For example, in the sentiment prediction problem domain, what might be a negative sentiment statement could be a neutral statement for another. So, establishing accurate ground truth is critical to identifying the model’s weakness.
- Perform Error Analysis: Once the areas of the model’s weaknesses are identified, it is important to perform systematic error analysis. If possible, classify the errors manually or automatically, understand the source of errors, and have different strategies for fixing each error as needed.
- Augment training data, if possible: If you know the exact patterns of mistakes, see if it is possible to collect more samples in those specific areas to augment the training data. Sometimes, this may not be possible if the errors are caused by corner cases that don’t occur that frequently.
- Log payload & continuously learn: Whenever possible and privacy policies allow for it, log the input data of the machine learning model. Correlate the input data with the timing and error analysis to identify the input samples upon which the model is making prediction errors. This data needs to be annotated by the SMEs to correct the machine learning model’s weaknesses in the next iteration of learning.
- Short-circuit learning with rules or micro-models: Often, the best way to deal with corner cases is by adding rules or by building targeted micro-models. Gathering enough samples of rarely occurring cases to teach the system might take too long and is not guaranteed to happen in a reasonable amount of time. Moreover, if you know the specific corner cases where the model is making mistakes, you already know the patterns. So, it’s much simpler to write a rule for these patterns than to collect many examples of that pattern to let the model learn the same rule that you already know. For example, in the anomaly prediction domain, what might be a high severity anomaly might not be as distinguishable from a low severity anomaly without further context. Whether an anomaly should be treated as an incident may depend on external factors such as how many times the problem occurs (i.e., error rate), how many other incidents have occurred this month, whether these incidents violate service level objectives or not, etc. These are best dealt with via post-processing rules. Also, sometimes, micro-models can be developed for targeted long-tail cases.
Machine learning models are susceptible to first-mile and last-mile problems similar to what happens with mapping software. Effectively dealing with the first-mile and the last-mile problem needs purposeful data collection, preparation, planning, diligent error analysis, and subject matter expert input and guidance. By working together humans and machines can co-create effective machine learning models.