Feature Store- An accelerator for AI adoption
How could a feature store help in accelerating data and AI adoption across the enterprises?
What is a Feature store?
We as data scientists often duplicate work due to not having a centralized repository and this comes at the cost of increased time to production, a lot of work repeated with inconsistency, no way to share the artefacts and reuse them. In this blog, we will address how it can reduce the time to market (production) for Machine Learning models and how does it address the above mentioned points.
There is an increasing demand and acceptance by the business users to leverage Data Science & AI in their enterprises to achieve the following objectives
- To optimise decision making
- To optimise business operations
- To create new business opportunities
On one hand are the business stakeholders and on the other end are technical geeks, data (science) professionals (I used this term to encompass all the data workers and those include data scientists, data engineers, data visualisation experts, data architects, etc.). Business owners make decisions and run the business, they are after Return on Investment (ROI), while data workers are after their technical artefacts running and working. In this race, data scientists build different Machine Learning (ML) and Deep Learning (DL) models with different datasets which they or other data workers ingest from disparate sources. More often than not, the process of preparing the data is similar for different models and much of this effort can be streamlined and reused across different AI endeavours.
Therefore, to make this process of building AI models more efficient, reusable, scalable the enterprises must build or leverage enterprize wide AI platform(s) which can cut down the time to production cycle by a big percentage in addition to reducing the costs of developing, deploying and sharing AI models across the enterprise.
A typical ML flow looks like the following:
A Typical Machine Learning Workflow
Data Scientists spend most of their time on preparing the data, extracting the features before they get to a point where they have something ready to be cooked (as a ML model). This waste of time, effort and resources can be saved and spent on getting valuable insights, much quicker by making this data to feature pipeline as something easily reusable and scalable.
When talking about cross organisation and inter-departmental AI initiatives, the features extracted in the data preparation step can be shared across multiple projects. Storing and sharing the extracted features, other data and AI artefacts (or assets) can greatly help in speeding up the AI efforts and Proof of Concept (POCs). In addition, the enterprises will have more scalable ways to data and AI assets management.
Feature Store:
The above discussions on achieving intended benefits around scalability, reusability etc are motivating factors to lead us to build such a platform (which we call a Feature Store) that can help organisations to share data and AI artefacts across different efforts, and teams. This would help to speed up the time to running and deploying the models as the caveats in the ML pipeline which take most of the time are now reduced to a larger extent by utilising/ reusing the existing artefacts.
“a feature store is a repository that allows teams to share, discover, and use a highly curated set of features for their machine learning problems”.
We can define it as in the following
It is a centralized software repository (library) that contains many functions (artefacts) where each of them creates a feature from the input data. You can extend this definition to storing, creating and sharing any of the other data & AI artefacts (models, notebooks, code files, etc) in addition to the features.
It can serve as a bridge to different teams and help them reduce the silos across your organization. A large number of data and AI artefacts can be stored in the feature store. they can be further updated, versioned and catalogued for different purposes. It can help the data scientists when building new models or enhancing existing ones, they can use the readily available artefacts and add new features (artefacts) to it.
You can leverage a feature store for both online and offline training of the models. In the former case, you can compute the required features (real time or batch pre-compute) and store them in to the feature store. While in the later scenario, you have all types of data as well as other characteristics associated to it which you could leverage for training the Machine Learning models.
The data scientists thus gain on automating the data preparation steps which are otherwise computed repeatedly. The standardisation of the data processing and versioning of the features offer consistency across different models while at the same time giving the opportunity to the data scientists to customize as and where needed. This makes easier to share the knowledge, models and other artefacts across the entire business.
Advantages of using a Feature Store:
You might be wondering what benefits would you get as an organisation after implementing a feature store?
Adding a feature store to your data and AI strategy could bring you the following benefits:
- Scalability: The Feature Store can grow with the rate of ML projects delivered. You can use existing features and possible other data and AI artefacts for sharing and bootstrapping the AI projects instead of reinventing the wheel every time.
- Economy of Scale: It becomes easier and faster for the organisations to develop AI models. It will require less resources to build new models as the new models can re-use features that exist in the Feature Store. Using a shared Feature Store can therefore enable organizations to achieve an economies-of-scale
- Less time to production: It takes less time to get the valuable insights by saving the time, effort and resources on data preparation and feature engineering every time reinventing the wheel. This does not mean you do not ahve to do any of that, but rather the scale and intensity would be much less. This would require you to do very little work as you could reuse and modify the features as and where needed.
- Consistency and Standardisation: The Feature Store solves the problem of inefficiency in the ML project pipeline. It benefits data scientists by automating the repeatedly conducted data preparation work and standardises the data processing, so that the features are consistent across different models.
#GlobalAIandDataScience#GlobalDataScience#Highlights-home#News-DS