By Austin Eovito and Vikas Ramachandra
Other blogs in MLOps series:
Operationalizing Data Science
Infrastructure for Data Science
Model Development and Maintenance
Data science is the practice of using mathematics, namely statistics, and computational tools to programmatically identify latent patterns in data. The emergence of applied data science in both the public and private sector has led to increased interest in the research and development of the technologies and tools utilized by data scientists. This blog is the first in a series of conversations and serves as an introduction to why firms trying to operationalize their data science practices should look to implement scalable infrastructure, master model development and deployment, and act strategically with their model monitoring and maintenance.
There are a series of problems related to infrastructure, that prevent firms from putting their data science models into production. First, consider that the model training and model production infrastructure are often disjoint, preventing a streamlined system to guide models from development (or research) to production. Second, production infrastructure often does not satisfy business imperatives surrounding security, compliance, logging, and chargeback; straining model deployment timelines. Finally, training and production infrastructure may reflect legacy architectural choices that prevent the technical flexibility most contemporary data scientists need. For example, the data science infrastructure may lack support for multi-cloud or private cloud environments, programming languages used by certain groups of data scientists, or the ability to manage virtual machines (VMs)/containers for maximum availability. The next blog in this series will dive into the depths of these infrastructure problems.
Model Development and Maintenance
One of the more time intensive challenges with operationalizing data science in business stems from the practices of model development and maintenance. Let us first consider the transition from model development to production. Current practices have led to enterprises redeveloping and refactoring entire codebases in order to adequately explain/gauge their model behavior, due to a hodgepodge of the aforementioned issues. Alternatively, there are also cases where a model is too tightly coupled to an application and cannot be easily reused. Finally– in firms with multiple, disparate data science teams– it is non-trivial to unearth the models that are in varied states of their development lifecycle. Even if identified, there may not be a trivial way of tracking models across systems, as multiple models may access heterogenous data as inputs, which can break subsequent models downstream. All of this, and more will be addressed in future blogs, such as: model lifecycle, model deployment, and model operationalization.
Now that our firm has obtained the infrastructure needed for their project, procured the relevant technologies, and optimized their model, we are now ready for deployment. There is a caveat, however, to deployment; now that the deployed model has become mission-critical, there is a need for a streamlined process and a set of supporting tools to test and validate deployed models throughout their lifecycle. Candidate models need to be assessed for their predictive validity, latency, and throughput characteristics under load. To ensure model health post-deployment, a standard operating procedure must be put in place to easily split traffic between multiple candidate models, as well as to upgrade, downgrade, and retire models. While deployment tools and processes are quite common in software applications, the frequency of their use in data science operations remains low and inconsistent.
With a performant model in hand, our team now needs to monitor several performance metrics, one of which is the model’s statistical validity. Doing so requires the ability to adequately observe, log, and analyze the model’s parameters, which can be obfuscated by the model’s internal structure (white box vs. black box). Depending on the nature of such models (mission-critical sensor system vs. chat-bot) requires the data science team to build their model with maintenance and monitoring in mind. Compounding these issues are models built with multiple inputs from a multitude of data sources, where drift or poor governance can lead to a deprecated model, which must be shut down and reimplemented in the model lifecycle. To accomplish these tasks where a business may have a singular model for multiple intents, or a multitude of models for one intent, the tech-stacks, dev-ops, and practices of yesterday must be supplemented with the machine learning operations of tomorrow.
Cho, C., and Dinter, D.S. (n.d). O’reilly AI Conference Machine Learning at Scale with Kubernetes. Retrieved from: https://cdn.oreillystatic.com/en/assets/1/event/286/Machine%20learning%20at%20scale%20with%20Kubernetes%20Presentation%201.pdf
Lorica, B., Doddi, H., and Talby, D. (June 19, 2019). What are model governance and model operations? Retrieved from: https://www.oreilly.com/radar/what-are-model-governance-and-model-operations/
Miner, D. (July 16, 2019). Machine Learning Vital Signs: Metrics and Monitoring of AI in Production. Retrieved from: https://www.slideshare.net/DonaldMiner/machine-learning-vital-signs
Peck, J. (n.d). OS for AI: Serverless, Productionized Machine Learning. Retrieved from: https://docs.google.com/presentation/d/1LNb69-dTqNCmFawuOOj1tNVZuQ1QA-Jqbj-LPunBK7w/edit#slide=id.g3974aef880_0_0
Resende, L. (July 17, 2019). AI Pipelines Powered by Jupyter Notebooks. Retrieved from: https://www.slideshare.net/luckbr1975/ai-pipelines-powered-by-jupyter-notebooks