watsonx.ai

 View Only

GA of Differential Privacy in watsonx.ai's Synthetic Data Generator

By Jordan Youngblood posted 13 days ago

  

tl;dr: In today’s v2.0 release of watsonx.ai, and already available through watsonx.ai SaaS on IBM Cloud, watsonx.ai's Synthetic Data Generator will now include the ability to enable differential privacy when generating synthetic tabular data.

Why should I care about Differential Privacy and Synthetic Data?

Many organizations are exploring how AI can improve their productivity and support their business goals. A major challenge for many of these organizations is getting value out of the data that they have in their data stores. The early stages in this value extraction process require organizing and preparing data for analysis or some other use case aligned with business objectives. Unfortunately, many hit roadblocks even in these early stages. When handling sensitive tabular data, it can take weeks to get the data into the right peoples hands for analysis or other data science tasks due to privacy and regulatory restrictions. This is a massive problem stopping businesses from efficiently using the data that they have.

This is where synthetic data generation comes in. Through synthetic data generation techniques, businesses can generate synthetic versions of their tabular data that maintain the characteristics of their original data (also known as maintaining data fidelity), while also removing the sensitive data directly from the data set that is produced. But, even this has risk as techniques are often vulnerable to re-identification attacks, pushing many organizations to mask their sensitive data altogether. But this can be time consuming and often reduces the utility of the original data set.

This is where differential privacy comes in. Differential privacy provides a framework to guarantee the privacy of sensitive information in a dataset by adding noise to the original data set prior to the generation process. This framework allows for data utility to be maintained compared to masking approaches, while also giving improved privacy results that traditional synthetic generation approaches cannot guarantee.

Enabling Differential Privacy for watsonx.ai's Synthetic Data Generator

As of today, watsonx.ai’s Synthetic Data Generator includes the ability enable differential privacy when generating data. Getting started is easy. Check out our documentation for more details, but I will highlight the important pieces.

First, once you create a Synthetic Data Generation flow in watsonx.ai, edit the Mimic node that is connected to your tabular data: 

Once editing the mimic node, you can enable differential privacy and use the two customizable variables (Privacy Budget and Privacy Leakage) to tailor the privacy level of the output data based on your privacy requirements and the sensitivity of the underlying information:

And finally, run the flow to create a Generate node that contains your synthetic data which can be evaluated or exported. The view below shows the view after the run with a Generate node that includes a shield icon indicating that differential privacy is enabled:

Generate Node with Differential Privacy Enabled

Conclusion

When differential privacy is appropriately configured in watsonx.ai’s Synthetic Data Generator, businesses can now better address data privacy and data utility concerns. This in turn reduces privacy and regulatory risk, making it easier for businesses to analyze and gain insights from their data. If you think this could be valuable for your business, check out the watsonx.ai page to get started with a free trial and reach out to us to learn more today!


#watsonx.ai

0 comments
31 views

Permalink