Data collaboration is in pervasive demand in the data science field across different industries. There are many situations where groups would like to collaborate on gaining insight from pools of data, but security and privacy concerns prohibit the sharing of proprietary data sources. IBM has found a solution to this dilemma. IBM Federated Learning, a component of Watson Studio, allows multiple parties to train a common model using remote sources of siloed data. Each party's data is never moved, combined, or shared, but the model benefits from being trained with the pool of multiple data sources so the final model can produce more accurate results.
Scenario: Pool industry knowledge on insurance risk
Consider a case where a consortium of insurance companies wants to do some predictive modeling on how climate events might impact future underwriting. Each party has their own data, which they do not want to expose or share. They can still get the benefit of insights from the entire pool of data without combining or sharing their data.
This is done by creating a common machine learning model. Each party trains their own model using their own data locally, then sends the local model to an aggregator that combines the results to generate a global model. Each party can then use the global model to generate insights for their business needs.
This illustration shows the federated learning process.
A recent article in Forbes magazine makes the case for federated model training and delves in to the evolution of this technology with Heiko Ludwig, a principal research staff member and senior manager of AI platforms at IBM Research AI.
How Federated Learning can enhance your analytics
The range of features included with Federated Learning targets the issues that data scientists encounter with collaborative model training, with the goal of making the process more streamlined.
Like other types of machine learning, in Federated Learning you can choose frameworks for predictive, neural-network, or unsupervised learning. You can also use hyperparameters to fine-tune your experiment by specifying training rounds, setting a final accuracy threshold, training with percentile of participants, and more.
Unlike distributed learning, which assumes that multiple homogeneous data sets reside on different servers, federated learning can account for data heterogeneity, accommodating significantly different data sets by implementing a data handler class, and apply weights to balance out the data in the resulting model.
Is Federated Learning right for you?
Find out if Federated Learning is the best solution to your machine learning needs. Check out:
About the authors
This post was created by Ashley Zhao, Julianne Forgo, and Ryan Wong, members of the IBM Watson Studio content team.