Global Data Science Forum

DevOps for ML Applications: CI/CD Workflows

By Sivakumar Anne posted Thu September 26, 2019 05:08 PM

  

As a part of the IBM Data Science Elite (DSE) engagements, we've had many conversations with customers about enabling CI/CD in the machine learning (ML) pipeline - as a result I've decided to summarize the steps in a blog. The blog outlines the steps for implementing CI/CD workflows using IBM Watson Studio, and IBM Watson Machine Learning. Before diving into the ‘how’ aspect, let me address why enterprises are looking to adopt CI/CD practices for building AI applications.
 

Traditional SDLC

Enterprises use proven software engineering practices to develop and deploy high quality software applications. Software teams implement multiple stages of the Software Development Life Cycle (SDLC) with each stage focusing on either development, QA or deployment tasks. Developing and deploying software applications requires moving software assets across multiple environments targeted for each phase of SDLC. While the practice ensures software quality, the siloed phases slowdown the pace of software delivery.

 

DevOps Accelerates Time-to-Value for Software Applications

Over recent years the emergence of Microservices architecture and Cloud-native technologies has made DevOps the de facto standard for developing and deploying software applications. DevOps practices combine software development (Dev) and IT operations (Ops) to shorten the SDLC life cycle. Compared to traditional software development, the combination of processes, practices and tools used in DevOps model help organizations deliver new capabilities in enterprise applications at a much faster pace.

 

CI/CD Workflows Automate DevOps

Continuous Integration(CI) and Continuous Delivery(CD) offer a set of best practices to automate DevOps for software applications.

The Continuous Integration (CI) tasks focus on building and driving integrated unit tests on the newly added software components as they are pushed from development to production environments. The Continuous Delivery (CD) tasks focus on releasing and deploying the fully tested code to production environments. The combination of end-to-end CI/CD tasks automate DevOps process.

CI/CD Workflows for ML DevOps

As enterprises increasingly adopt AI and build out Machine Learning (ML) applications, integrating CI/CD workflows into the end-to-end ML life cycle is quickly emerging as a critical necessity. We help customers bring the same process and standardization their CI/CD pipelines afford their software development practice, and apply it to their ML application development steps.

 

Enabling CI/CD workflows in ML life cycle will require automating multiple steps from build phase to the deployment phase. A typical CI/CD workflow for ML should implement the following functionality:

  1. The data scientist or ML engineer submits ML project assets from his development environment into a central version control system like GitHub or Bitbucket The ML project assets could include trained models, notebooks, scripts, shiny apps and flows. Optionally, if the ML assets are ready for deployment he will assign a ‘Deployment Ready’ tag to the committed version.
  2. The Git Repository will trigger a preconfigured job in a Pipeline Engine like Jenkins for every commit action associated with the ‘Deployment Ready’ tag
  3. The triggered Pipeline job will build, test and deploy/update ML assets in the staging or deployment environments.
BlogCapture.jpg


Enabling CI/CD Workflows for Watson Studio and Watson Machine Learning

IBM Watson Studio (WS) and Watson Machine Learning (WML) offer an end-to-end data science platform to build and deploy ML models. The built-in integrated support for Git repositories like GitHub/Bitbucket and a CLI library that helps drive WML administration tasks from custom scripts enables integration of CI/CD workflows into DevOps for ML.

 

Now that we’ve talked about the full workflow, and the associated principles or responsibilities, we’ll use the following sections to describe the steps to implement a simple CI/CD workflow for a ML project we built in Watson Studio and deployed to Watson Machine Learning.

 

CI/CD Workflow Scenario

In the sample workflow, we demonstrate the task of automating updates to a deployment release in Watson Machine Learning with newly versioned and tagged ML assets from Watson Studio project. The scenario uses Bitbucket Server as Git host and Jenkins as pipeline engine to implement the CI/CD workflow.

BlogCapture2.jpg

Configure Watson Studio (WS) for Bitbucket Server Integration
In this step, setup an ML project in Watson Studio by pulling assets from the Bitbucket server repository. The sample project in repo includes a Notebook, Dataset and a trained Model. 

  1. Configure Watson Studio with Bitbucket integration parameters: Git Host URL, Bitbucket Username, Access Token and token Name.        
BlogCapture2a.jpg 

The access token is the token generated for the user in Bitbucket server.

b. Create ML Project in Watson Studio from the Git repository in Bitbucket server

BlogCapture4.jpg

To test the integration with Bitbucket server, edit any ML asset (e.g. notebook), commit changes and push project to the remote Git repository. Verify and validate changes to contents in repository from Bitbuckets UI.

 

Configure Bitbucket to Trigger the Pipeline Job in Jenkins

In this step, configure the Bitbucket repository to trigger a Jenkins job in response to ‘Push’ and ‘Tag’ event.

  1. Install “Post Webhook for Bitbucket” from Bitbucket server UI. Navigate to Repository Settings > Hooks > Add Hook > “Post Webhook for Bitbucket”
BlogCapture5.jpg

b. Create Post Webhook in repository to trigger a Jenkins job based on Push and Tag events. Configure Jenkins server URL in the format http://xxxxx.com:8080/bitbucket-hook/.
BlogCapture6.jpg


Configure and Build a Pipeline Job in Jenkins to Update Deployed Project Release in WML

In this step, build Pipeline job using WML CLI to update a deployed project release with newly committed changes. 

  1. Install ‘Bitbucket Plugin’ for Jenkins. From Jenkins UI, navigate to Manage Jenkins > Manage Plugins > Filter > type Bitbucket > Available tab. Select plugin and install it.
BlogCapture7.jpg
b. Create a new pipeline job. From Jenkins UI, navigate to New Item > Select Freestyle project and give a name.
BlogCapture8.jpg
c. Configure Bitbucket repository in SCM section. From Jenkins UI, navigate to Source Code Management. Select Git and enter repository URL, credentials. Update ‘Refspec’ and ‘Branches to build’ as shown here to receive tag info for committed projects.

d. Configure build trigger to be a push event from Bitbucket.

e. Configure Build Environment to pass WML host, user and password details as environment variables for WS CLI.

BlogCapture10.jpg
f. Configure and test WS CLI on Jenkins host using the instructions outlined in https://www.ibm.com/support/knowledgecenter/SSHGWL_1.2.3/local/admincli.html. Verify CLI works by invoking ‘java wscli-1.0.jar -h’ on the Jenkins host.

g. Configure the Build step to execute a shell script. Add code to extract project, tag details from Bitbucket payload and update the project release in WML using CLI. The CLI call requires project-release, git-tag, repo-url and token name as arguments.

echo "Running wscli command"

gitrepo=$GIT_URL

project=$(echo ${gitrepo##*/} | sed 's#\.git##g')

echo "project-release to update:" $project 

fulltag=$GIT_BRANCH

tag=${fulltag##*/}

echo "updated tag:" $tag 

echo "Updating Project release in WML"

java -jar /wsclidir/wscli-1.0.jar edit projectrelease --project-release $project --git-tag $tag -hs bitbucket -ru $GIT_URL -tn dse-sanne
BlogCapture9.jpg

Test the CI/CD Workflow

In this step, we commit a change in Watson Studio and push the project with new tag to Git repository. We can verify that Jenkins triggers the pipeline job to update its deployed release in WML. In this scenario, we are assuming that an older version of deployed project release exists in WML.

  1. Commit/Tag and Push project changes from Watson Studio

b. Verify pipeline job is triggered in Jenkins 
BlogCapture11.jpg

c. Verify updated project release in WML
BlogCapture12.jpg

Conclusion

IBM Watson Studio (WS) and Watson Machine Learning (WML) offer essential capabilities to automate DevOps for ML with CI/CD workflows. Enterprises can accelerate time-to-value of AI applications by integrating CI/CD workflows with ML life cycle. The automatic and standardized testing of the traditional SLDC helps bring our customers peace of mind as they publish and update the AI applications that are integral to the bottom-line success of their businesses.

For a more detailed discussion on AI Ops – Managing the End-to-End Lifecycle of AI, please refer to https://ibm.co/AI-Ops.  For info on IBM DataScience Elite team refer to https://www.ibm.com/community/datascience/elite/.

 


#Hands-on
#Hands-on-feature
0 comments
62 views

Permalink