At our March virtual meetup (check the blog to
watch the replay), we welcomed Kush R. Varshney to talk about
trustworthy machine learning. When we asked him where our attendees might find opportunities to play with some of these concepts, he suggested the IBM Research
Data Quality for AI API, which comes with a free trial:
Data practitioners spend a considerable amount of time in iterative pre-processing of data before it is considered to be of adequate quality for downstream machine learning tasks. Although time consuming, pre-processing is an essential step because the quality of training data directly impacts the complexity as well as accuracy of AI models. Getting insights into the quality of data before it enters a machine learning pipeline can significantly reduce model building time, streamline data preparation efforts and improve the overall reliability of the AI pipeline.
The Data Quality for AI is an integrated toolkit that provides various data profiling and quality estimation metrics to assess the quality of ingested data in a systematic and objective manner. These metrics quantify data issues as a score between 0 and 1, where 1 indicates no issues were detected. These metrics are for tabular datasets and accept the input in the form of a comma separated value file.
If you're interested in exploring this API, we invite you to do so over the next couple of weeks and join us here on the community to share your impressions and feedback and ask and answer questions.
We got in touch with our colleagues at IBM Research in India, and they have kindly committed to making themselves available during this time and check for new replies to this thread and respond where they can.
In addition, we've set up a
check-in call on Thursday, April 28 at 8am Pacific, in case you'd like to meet them and go in more depth.
In order to participate in this "community trial" and to join our April 2022 cohort, simply follow these steps:
See you soon!
------------------------------
Tim Bonnemann
------------------------------
#GlobalAIandDataScience#GlobalDataScience