IBM Data Science and AI Elite Team Authors:
Tim Bohn - Data Scientist / Sr. Solution Architect
Brian Calder - Lead User Experience Designer
Anup Nair - Sr. Solutions Architect & Industry Lead
Rakshith Dasenahalli Lingaraju – Data Scientist
Elie Elisee Paul - Software Developer
Amber Arriaga – Data Scientist
Photo by Diego D'Ambrosio on Unsplash
Introducing IBM Industry Accelerators
The IBM Data Science and AI Elite (DSE) team was created over two years ago to work with clients in every industry to help them bring value to all aspects of the business by harnessing data science and machine learning. After over 160 engagements with clients worldwide, the DSE team created templated packages for IBM's Cloud Pak for Data (CPD), for some of the top use cases based on learnings from these engagements. We call these Industry Accelerators.
The accelerators are great learning assets, but they are so much more. They are usable components that help kickstart their own implementation, leveraging the differentiation of Cloud Pak for Data and enabling an organization to implement the solution on their own data and get to productive use in an accelerated timeframe.
The Telco Customer Churn accelerator is designed to address a wide range of audiences, from executive decision makers to data scientists to application developers. Glossaries and Terms provide the information architecture that you need to effectively catalog and analyze your data. The data science project includes assets covering data visualization and different types of machine learning models enabling the data scientists on the team to collaborate and extend the template models based on available data. The asset covers how to operationalize the implemented data science models, including a sample application, which demonstrates how the deployed models might be used embedded within a business workflow by the application developer.
Customer Churn accelerator gives you a quick start
Customer Churn is when a current customer ends their relationship with the company. For the company this can be bad because it costs five times as much to attract a new customer than to keep an existing one1. The problem of customer churn is well known so this won't go into detail to describe the problem, but rather how this Cloud Pak for Data accelerator can get a project going quicker with a range of assets to help. We will describe the user story we built the assets against, the data preparation steps, the analytics project to do the data science and machine learning and the web application built to show how the model might be used.
The User Story gives the context
How we imagined this being used is that a Customer Service representative for a telco (in this case it is the fictitious BlueCo Communications) in either an on-phone, or an in-store environment using a desktop or mobile device respectively, actively talking to a customer and wanting to determine how likely the customer is to cancel their subscription. An application allows the customer service representative to submit a unique ID for the customer and the application displays the details of that customer (billing and profile information, current subscriptions, and service usage profile). The customer details interface also includes a propensity-to-churn percentage value as calculated by the machine learning model. The user can then see recommended promotional offers that might help retain the individual as a customer.
Data Preparation is key
“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.”
― Abraham Lincoln
Data preparation is like chopping down that tree. It accounts for about 80% of the work of data scientists2. Data is the heart of any analytics project. A rich data set can derive invaluable insights. For this project we used data from various sources and joined it to form a single data set that was sufficient for the purpose of deriving customer churn results. This data was stored in the CPD data lake by cataloging it in the Watson Knowledge Catalog (WKC). WKC stores the meta data of the data that is stored remotely but it also lets you store local data along with its meta data. WKC lets data stewards and data owners apply access restrictions to the data by applying business terms and rules. It also lets you mask personally identifiable information (PII) so that only people with the correct privileges can see it. Once the data is available in the data catalog, data engineers can search for it and promote it to be used in their data science projects.
The data that is available in the catalog, may either have more information in a single data file than the project needs or may be data scattered across multiple data files. In either case we can use the Watson Data Refinery in CPD to filter out only the relevant data needed for the project. The Data Refinery lets you create refinery flows by which data engineers can do various database operations to derive a data set that is most relevant to the project and easily consumable by the data scientist.
Here is the data flow from the source to the destination usage in context to the Telco churn project:
Analytics Project – where the data science work is done
For the Data Scientists perspective, this accelerator supplies an Analytical project within Cloud Pak for Data. The Project provides a mechanism for the Data Scientists to work as a collaborative team building out the assets needed for the use case. Within the project, the team can work on all aspects of the Data Science project from continued data preparation, data understanding and machine learning model building.
Jupyter Notebooks in the project
For the Data Scientist who prefers writing code, there are open source Jupyter notebooks. The DataExploration notebook was created to explore statistics, relationships and patterns in the data, in order to gain a deeper understanding of the data.
The Data Scientist then uses this data in the ModelCreationAndDeployment notebook to build a pipeline that applies feature engineering on the data into the format the model requires. Some of these feature engineering steps like calculating age from date of birth, industry specific business term mappings as well as how to deal with missing values for specific columns are all custom built, which are defined in the Telco Custom Transformer Script which is saved as a zip file and imported into the ModelCreationAndDeployment
Once the pipeline is trained, validated and saved, a custom wrapper function is written for deployment. When deployed it takes care of downloading the pipeline, the custom pre-processing script along with any other details needed. This deployed wrappers functions can then be accessed via an API, that will be consumed by the web application.
SPSS Modeler Flow for visual modeling
For the Data Scientist who would rather use a visual approach, a (SPSS) Modeler Flow is included. This Flow shows how to wire together some powerful "nodes" to create a pipeline. This pipeline does similar things to what the Jupyter notebooks are doing and also creates a pipeline that can be deployed.
For those who want to try out autogenerating a model from the data, there is AutoAI. Click on the "New AutoAI experiment", give it a name and then click the "Create" button. The AutoAI model generator will automatically analyze the data and generate candidate model pipelines customized for your predictive modeling problem. These model pipelines are created iteratively as AutoAI analyzes your dataset and discovers data transformations, algorithms, and parameter settings that work best for your problem setting. Results are displayed showing the automatically generated model pipelines ranked according to your problem optimization objective.
Business Web App
The sample application developed to demonstrate the Telco Churn accelerator mimicked a simplistic interface a customer service representative might use while fielding calls from customers. We wanted it to be clean and easy to understand, so we made it a single-page interface, divided into 4 clear panels. The service representative could see the general customer information in one panel, so they knew to whom they were speaking. They could see their "Usage Profile" in another so it was clear what services the customer was using. The Likelihood of Churn panel demonstrated a percentage in a graphical interface that a given customer was likely to cancel their services. And finally, the Recommended Offers panel provided some potential promotions that the representative could offer the customer if they were likely to churn, in an effort to convince them to stay. These offers represent real value to the particular customer, as they are based on this usage profile - e.g. in the case of a slower speed Internet service but high streaming usage, the interface might recommend faster Internet download speeds at a discounted introductory rate.
The sample application was designed to illustrate new components that could be introduced to existing systems to leverage the artificial intelligence output of the model.
Once models are ready and tested, they can be deployed as web services and embedded into our customer application. When using IBM Cloud Pak for Data, deploying models in Watson Machine Learning is a matter of a couple of clicks. Each model then becomes available through a REST API endpoint. The final step then is to build and deploy a web application (e.g. Node.js), or embed services in an existing one, and generate customer's probability to churn by accessing the deployed ML model.
The first page of the sample application is a simple form where the customer representative will enter the wanted customer identification.
When the customer identification is submitted, the application retrieves the customer information (from csv file in this demo, or anywhere your customer's information is stored). Then, the application communicates with the deployed ML model via REST API endpoint to first submit the customer information to the ML model, then receive the ML response which contains the churn's probability of this particular customer.
After receiving the ML response, the second page displays the customer's information which consists of the customer profile, the services they used, their likelihood of churn, and a suggestion of offers.
The Data Science & AI Elite Team
There are many Industry Accelerators today and more coming throughout the year. The best part: these are absolutely FREE and available for your consumption on the IBM Data Science Community .
Industry Accelerators run on the IBM Cloud Pak for Data platform. To find out more about the capabilities of the platform and to start a free trial, visit: https://www.ibm.com/products/cloud-pak-for-data
Interested in learning how to kick-start your data science project with the right expertise, tools and resources? The Data Science Elite team can plan, co-create and prove the project with you based on our proven Agile AI methodology. Request a free consultation: ibm.co/DSE-Consultation
Visit ibm.co/DSE-Community to connect with us, explore our resources and learn more about Data Science and AI Elite.