As a part of the IBM Data Science Elite (DSE) engagements, we've had many conversations with customers about enabling CI/CD in the machine learning (ML) pipeline - as a result I've decided to summarize the steps in a blog. The blog outlines the steps for implementing CI/CD workflows using IBM Watson Studio, and IBM Watson Machine Learning. Before diving into the ‘how’ aspect, let me address why enterprises are looking to adopt CI/CD practices for building AI applications.
Enterprises use proven software engineering practices to develop and deploy high quality software applications. Software teams implement multiple stages of the Software Development Life Cycle (SDLC) with each stage focusing on either development, QA or deployment tasks. Developing and deploying software applications requires moving software assets across multiple environments targeted for each phase of SDLC. While the practice ensures software quality, the siloed phases slowdown the pace of software delivery.
DevOps Accelerates Time-to-Value for Software Applications
Over recent years the emergence of Microservices architecture and Cloud-native technologies has made DevOps the de facto standard for developing and deploying software applications. DevOps practices combine software development (Dev) and IT operations (Ops) to shorten the SDLC life cycle. Compared to traditional software development, the combination of processes, practices and tools used in DevOps model help organizations deliver new capabilities in enterprise applications at a much faster pace.
CI/CD Workflows Automate DevOps
Continuous Integration(CI) and Continuous Delivery(CD) offer a set of best practices to automate DevOps for software applications.
The Continuous Integration (CI) tasks focus on building and driving integrated unit tests on the newly added software components as they are pushed from development to production environments. The Continuous Delivery (CD) tasks focus on releasing and deploying the fully tested code to production environments. The combination of end-to-end CI/CD tasks automate DevOps process.
CI/CD Workflows for ML DevOps
As enterprises increasingly adopt AI and build out Machine Learning (ML) applications, integrating CI/CD workflows into the end-to-end ML life cycle is quickly emerging as a critical necessity. We help customers bring the same process and standardization their CI/CD pipelines afford their software development practice, and apply it to their ML application development steps.
Enabling CI/CD workflows in ML life cycle will require automating multiple steps from build phase to the deployment phase. A typical CI/CD workflow for ML should implement the following functionality:
- The data scientist or ML engineer submits ML project assets from his development environment into a central version control system like GitHub or Bitbucket The ML project assets could include trained models, notebooks, scripts, shiny apps and flows. Optionally, if the ML assets are ready for deployment he will assign a ‘Deployment Ready’ tag to the committed version.
- The Git Repository will trigger a preconfigured job in a Pipeline Engine like Jenkins for every commit action associated with the ‘Deployment Ready’ tag
- The triggered Pipeline job will build, test and deploy/update ML assets in the staging or deployment environments.