Although you have likely heard this many times before, it is worth repeating that more than 80% of all data that is collected by organizations is not in a standard relational database. Instead, it is trapped in unstructured documents, social media posts, machine logs, images, and other data. Many organizations face challenges to manage this deluge of unstructured data. For example, if you want to use large-scale analytics on this data to gain insights for your business priorities, how are you going to pinpoint and activate relevant data? Furthermore, how do you go about identifying and classifying sensitive data while removing data that is redundant and obsolete?
IBM Spectrum Discover can help you manage your unstructured data by lessening data storage costs, uncovering hidden data value, and lessening the risk of massive data stores. It is a modern metadata management software that provides data insight for petabyte-scale file and object storage, storage on premises, and in the cloud. It enables organizations to make better business decisions, and gain and maintain a competitive advantage.
IBM Redbooks team has recently published two Redbooks publications that cover practical AI use cases with IBM Spectrum Discover and other IBM Storage software:
The book has the following six use cases that are explored in technical depth. The summaries for all use cases can be found in Section 1.8, “Overview of the use cases” on page 25, so you may want to go over this section first before reading about the details of the scenarios.
- Categorizing medical imaging data with content-search capability
- Extracting metadata from LIDAR imagery with custom applications
- Organizing training data sets for artificial intelligence
- Using artificial intelligence in medical imaging - JFR Challenge
- Data Governance use case: Data staging for high performance processing
- Data Optimization use case: Data migration to tape for cost-efficient archiving
In addition, Chapter 3 presents a reference architecture on how to design and implement an AI data pipeline using IBM Spectrum Discover.
- Cataloging Unstructured Data in IBM Watson Knowledge Catalog with IBM Spectrum Discover, REDP-5603
This paper explains how IBM Spectrum® Discover integrates with the IBM Watson® Knowledge Catalog (WKC) component of IBM Cloud® Pak for Data (IBM CP4D) to make the enriched catalog content in IBM Spectrum Discover along with the associated data available in WKC and IBM CP4D.
Several in-depth use cases (see chapters 5 and 6) are used that show examples of healthcare, life sciences, and financial services.
This integration enables storage administrators, data stewards, and data scientists to efficiently manage, classify, and gain insights from massive amounts of data. The integration improves storage economics, helps mitigate risk, and accelerates large-scale analytics to create competitive advantage and speed critical research.
I hope you enjoy these books. We would love to hear from you. Let us know if you have any questions or comments.