Global AI and Data Science

 View Only

Model Monitoring

By Austin Eovito posted Mon November 04, 2019 11:56 AM

  


By Austin Eovito and Vikas Ramachandra

Other blogs in MLOps series:
Operationalizing Data Science 
Infrastructure for Data Science
Model Development and Maintenance   
Model Deployment
Model Monitoring


Introduction

Model monitoring is the process of ensuring the continuous performance of a model after production in real-time. In order words, model monitoring entails seeing how a model prediction  (John Yick et al.,2014) is performing against actual recent experience as they emerge. However, getting a model from a "research prototype" to product deployment requires significant effort and resource coordination. Aside from getting the model to product deployment, a data scientist should equally monitor the model to see its performance after production. Implementation of a product can unearth significant differences between the data scientist's intuitions and real-life. To account for this difference, a professional data scientist has to monitor the model's prediction performance, latency and throughput, responding appropriately but quickly to the edge cases encountered in the process. The essence of monitoring is to ensure that the model inputs look similar to the ones used in training. Tracking of the model's performance can be achieved using software to ensure that the model inputs correspond to those used in the preparation. Also, application of coarse functional monitors (Ted Dunning,2019), non-linear latency, and histogramming can be used to monitor the model's performance. The monitoring enables tracking of inputs and results in variable distribution shifts (Robert Chu et al.,2007), automate monitoring tasks, etc.

Prediction Anomalies and Response

There are a series of factors that contribute to the variation of inaccurate prediction of a model's performance. For instance, the model's prediction performance may vary from the data scientist's intuition as a result of extreme high values (Sayak Paul 2019) and low absolute values data. The output often leads to data objects that deviate from expectation. Also, the model performance can differ if the structure of the relational model has drifted over time or changed abruptly due to recent events. To guard against changes in post-deployment prediction performance, data scientists can adopt Automated Machine Learning (Hutter F et al., 2014;Kotthoff L et al.,2017;Truong A et al.,2019). The automated machine learning automates prediction of the model's performance in the real world instead of manual correction and approach. The constraint with AutoML is the introduction of hyperparameters (Frank Hutter et al.,2018) into the dataset, which requires expertise to set. Besides, well-maintained data science infrastructure and tools can approximate correctness by automatically checking the predictions from the deployed model against those from previous stable versions, on the same data.

 

In the cases where either the input or the output differs significantly from its benchmark (test input and older model's output respectively), monitoring software such as Knime, Apache Spark, and Pentaho can trigger an alert to the data science team. The signal should cause the team to begin the process of investigating the root cause of the differences in input or output and to prepare a response. In some cases, the eventual response may require building a new version of the model, whereas in other instances, downgrading to an older version of the model may suffice. Adoption of robust deployment tools and methodologies by the data science team produces efficient responses and reduces the challenge of inferior predictions.

Latency, Throughput Anomalies, and Response

Perhaps, anomalies in latency and throughput are more comfortable to accommodate when compared to anomalies in prediction. Latency and throughput, which are performance indicators, are more widely understood. In the same vein, it is easier to automate responses in latency and throughput. In most cases, the latency and throughput benchmark for a model can be derived from the observed performance during the validation and live-testing phases. Benchmarks can also be obtained from the performance of previous versions of the model or similar models serving other applications. Monitoring software can detect data inconsistencies even at the lowest level by triggering alerts. Monitoring software can also execute additional responses such as expanding the capacity (storage, compute, network bandwidth) available to the model. In practice, such automated actions are accompanied by other alerts to data scientists and dev-ops. However, there are scenarios whereby human intervention could be needed if the pre-configured activities that expand capacity do not resolve the anomaly.

 

References

Freiburg, Germany Frank Hutter Laramie, WY, USA Lars Kotthoff Eindhoven, The Netherlands Joaquin Vanschoren 0ctober, 2018. Automatic Machine Learning: Methods, Systems, Challenges.

Hutter F, Caruana R, Bardenet R, Bilenko M, Guyon I, Kegl B, and Larochelle H. "AutoML 2014 @ ICML". AutoML 2014 Workshop @ ICML. Retrieved 2018-03-28.

John Yick and Michael McLean. May 22nd, 2014. To freak out or chill out? A guide to model monitoring.

Kotthoff L, Thornton C, Hoos HH, Hutter F, Leyton-Brown K (2017). "Auto-WEKA 2.0: Automatic model selection and hyperparameter optimization in WEKA". Journal of Machine Learning Research. 18 (25): 1–5.

Robert Chu, David Duling, Wayne Thompson 2007. Best Practices for Managing Predictive Models in a Production Environment. Page 6 – 7.

Truong A, Walters A, Goodsitt J, Hines K, Bruss B, Farivar R (2019). "Towards Automated Machine Learning: Evaluation and Comparison of AutoML Approaches and Tools". arXiv:1908.05557.  



#GlobalAIandDataScience
#GlobalDataScience
0 comments
16 views

Permalink