Data Governance - Knowledge Catalog

 View Only

Rethink data quality management with Cloud Pak for Data 4.7

By Marcus Boone posted Mon June 26, 2023 12:00 PM


We are happy to announce the release of version 4.7 of IBM Cloud Pak for Data and with it, IBM Knowledge Catalog coming June 28. 

Cloud Pak for Data is designed to help our clients implement a data fabric architecture that serves as a foundation for our recently announced IBM watsonx capabilities. This release packs several new features, integrations and enhancements to the IBM Knowledge Catalog service of Cloud Pak for Data to support core governance use cases for our clients. 

Here are the highlights of the new and improved capabilities designed to help clients with their data quality, data privacy and data discovery scenarios. 

1.     Streamlined data quality management

Managing data quality continues to be a critical priority for our clients and with the new release we have made step-changes to the data quality capabilities within IBM Knowledge Catalog.

At the center of the new data quality features is the all-new data quality tab with projects and catalogs.  Gone are the days where users need to search multiple tools to get a view of data quality.  The new tab brings the data quality information together in one place, including the ability to review data quality dimension scores for completeness, uniqueness, validity, and others, and the option to drill down into data quality checks, their logic and data records that have contributed to poor data quality scores.  All of this is designed to expedite identification and remediation of quality issues.  We have an upcoming Chat with the Labs session on July 5 where you can join or watch the replay to see the features in action.  Register for the series here:​ 

Additionally, in order to enable organizations to get a single, trusted view of organizational entities, we have integrated a new “entity confidence” dimension for Match 360 into the data quality tab in order to allow users to verify the percentage of entities that requires manual matching and remediation.

We have enhanced our Data Quality scoring mechanism for data assets to combine results from metadata enrichment and data quality rules, as well as optionally from Match360. Users also have the ability to integrate the enhanced Data Quality API with Databand as well as other third party tools. 

Finally, the data quality output table for exception record handling will support indication of the data quality rule definition in data quality rules combining multiple checks.

2.     Enhanced data privacy

To help clients protect data outside of the Cloud Pak for Data platform, we continue to improve the integration with IBM Security Guardium Data Protection which allows for data protection rules defined in IBM Knowledge Catalog to be used for enforcement as data is accessed across key endpoints outside of Cloud Pak for Data. It supports all masking methods and currently supports Teradata, Hadoop/Hive, Oracle & MySQL.  Join us in mid-July for a Chat with the Labs session to see this new integration in action.

The ability for users to view asset metadata without having access to actual data further strengthens enforcement of data access controls.  IBM Knowledge Catalog can be configured to deny all users access to actual data by default, and only allow specific users to access actual data through data protection rules.

3.     Other key features and improvements

To enable discovery of relationships between assets, we have introduced automatic detection of primary and foreign keys and relationships, with support for multi-column primary keys and overlap analysis on multiple columns.

To get more details about column profile, we have enabled detailed profiling information from within a metadata enrichment or from an asset’s Profile tab in a project or a catalog. For each column, view and visualise statistical information about the column data, information about data classes, data types and formats, and the frequency distribution of values in the column. For the statistical information, you can choose between several types of visualisations.

Users can define custom attributes and relationships to extend the default set of properties and relationships for catalog assets and governance artifacts . For catalog assets, users are no longer restricted to the API to define custom properties and relationship types within the new UI. And for governance artifacts, the experience is smoother than before and offers a broader set of capabilities.

We continue to build out the reporting data mart with new updates supporting enhancements in IBM Knowledge Catalog.  This includes distinction between tables, views and aliases, custom properties for assets, metadata enrichment statistics (e.g. Min, Max, Frequency), data quality, workflows, & support for draft artifacts. Our strategy continues to be one where users can build the reports they need, in the tools of their choice, based on what is in the data mart. Moreover, when you send your IBM Knowledge Catalog data to an external database to generate reports, you can now choose an Oracle database, in addition to PostgreSQL and Db2 databases.

To support advanced use cases for reference data sets, you can now specify multiple columns to create a composite key for your reference data sets. Previously, reference data values relied on a unique code column. The values in the code column no longer need to be unique individually, but their uniqueness is guaranteed when combined with the specified columns. Additionally, the reference data sets UI provides a more robust interface for data creation.

Get started today with a more efficient footprint

To help organizations get more out of their Cloud Pak for Data investments, we have made significant progress in driving efficiencies by reducing the number of virtual CPUs required to run core governance tasks on the Cloud Pak for Data platform. With version 4.7, the IBM Knowledge Catalog service will require up to 25% less vCPUs giving enterprises the ability to scale their workloads within the platform. 

Watch our webinar where we demonstrate these new capabilities to help you get the most out of IBM Knowledge Catalog on Cloud Pak for Data. Register to watch now

Get started easily by taking the Data Governance trial on Cloud Pak for Data to explore these new features as they are pushed to IBM Cloud SaaS instances over the coming weeks.