watsonx.data

 View Only

An AI model trained on data that looks real but won’t leak personal information

By NICK PLOWDEN posted Thu January 18, 2024 10:34 AM

  

IBM unveils a new method for bringing privacy-preserving synthetic data closer to its real-world analog to improve the predictive value of models trained on it.

A revolution in how businesses handle customer data could be around the corner, and it’s based entirely on made-up information.

Banks, health care providers, and other highly regulated fields are sitting on piles of spreadsheet data that could be mined for insights with AI — if only it were easier and safer to access. Sharing data, even internally, comes with a high risk of leaking or exposing sensitive information. And the risks have only increased with the passage of new data-privacy laws in many countries.

Synthetic data has become an essential alternative. This is data that’s been generated algorithmically to mimic the statistical distribution of real data, without revealing information that could be used to reconstruct the original sample. Synthetic data lets companies build predictive models, and quickly and safely test new ideas before going to the effort of validating them on real data.

The standard security guarantee for synthetic data is something called differential privacy. It’s a mathematical framework for proving that synthetic data can’t be traced to its real-world analog, ensuring that it can be analyzed without revealing personal or sensitive information.

Read the full story here.

If you have any questions or comments, please reach out to me on the community.

Thank you,

Nick

AI Community Lead


#watsonx.data

1 comment
8 views

Permalink

Comments

Fri January 19, 2024 08:45 AM

Very worth reading - esp link on Synthetic data - but bear in mind the comment from the linked article: “Real world data is rarely problem-free,” the risk and challenge is that synthetic data is less problematic than real data and therefore may not fully represent the complexity, vagaries and diversity of real-world financial data.  Moreover when audited and examined by regulators there is also a risk that synthetic data is challenged.