Data Governance - Knowledge Catalog

 View Only

What's New in Watson Knowledge Catalog for Cloud Pak for Data v4.0.2

By Corey Keyser posted Tue September 28, 2021 12:08 PM


The new release of Watson Knowledge Catalog will see some substantial improvements within data privacy, platform user experience, and data synchronization. In particular, this new version will expand Watson Knowledge Catalog's support for Power systems while also improving customer options for connection synchronization. This is all in addition to some exciting new governance features like format-preserving data masking and out-of-the-box support for the classification of personal information.

Additional Data Classes to support Format-Preserving Masking for Data Virtualization
As part of Watson Knowledge Catalog's commitment to building industry-leading data protection and data privacy support that spans across the platform, users will now have the ability to use format-preserving data masking for the following new data classes:
  1. Mask names while preserving gender
  2. Mask email address while preserving domain name
  3. Mask zip code by masking with zip code in proximity
  4. Mask phone number
  5. Mask city name while masking with valid alternate city
  6. Mask street name
  7. Mask SSN
  8. Mask drivers license
These tools will give data consumers the ability to meaningfully analyze data for patterns while still maintaining the integrity of otherwise sensitive data.

Improved UX Features for Profiling Governance Artifacts
When adding a governance artifact as a related artifact or while adding an asset as a related asset, users will see a modal appear on the right-hand-side panel that will allow them to see a profile of the artifact before making a decision about that artifacts relationship. This will improve the ability for users to quickly profile artifacts and it will also ensure that the user is selecting the right object.

Out-of-the-Box Support for Classification of Personal Information
In order to support the classification of personal data, a new default classification called Personal Information (PI) is being added to Watson Knowledge Catalog. This is in line with data privacy efforts which are broader than the PII (personally identifiable  information) and SPI (sensitive personal information) classifications originally introduced in GDPR. The trend in regulations (as well as IBM’s decision soon after GDPR came out) is to use Personal Information rather than PII because PII is too limiting and doesn’t really address the scope of “data privacy” (Watson Knowledge Catalog will continue to support classification for PII and SPI).

REST APIs to Manage Custom Attributes for Out-of-the-Box Assets
Developers can create custom attribute definitions via REST APIs that apply to any supported asset types including COBOL assets, Data Assets, Connections, Models, BI Reports, and Table Definition. This will bring Watson Knowledge closer to parity with IGC through it's support of broader attribute types like Text (support hyperlinks, rich text and globalized strings), Pre-Defined Values (include value and description), Date and Time, and Number (whole or decimal)
Users will be able to re-use previously defined custom attributes (or groupings of custom attributes) to create associations with different assets or governance artifact types. For example, a user can create a set of custom attributes related to KPIs or GIS data that they could then associate with different asset types.

Improved Asset Status Experience on Metadata Import
We are implementing a new protocol for tracking the status of metadata import timestamps. We have moved from our original version where an "outdated" status is computed using the last updated timestamp, into our new protocol where customers can track imports using the following timestamps: "last imported", "first imported", and "last outdated".  We believe this will allow customers to more easily and reliably detect changes within metadata import.

Additional Integrations for Metadata Import
In 4.0.2 users will be able to import metadata from SAP Hana, Databases for MongoDB, and MongoDB.

Data discovery from Netezza sources
Automated discovery and quick scan jobs can be run on Netezza data source by using a Generic JDBC platform connection.

Enhanced Options for Connection Synchronization
In 4.0.1, Watson Knowledge Catalog expanded support for bi-directional synchronization for assets from Snowflake, Hive, and Generic JDBC connections.  In 4.0.2, WKC will add sync support for Hive Kerberos, but this will be from IGC/XMETA to WKC only.  

Bi-Directional Sync between WKC and External Repositories
Before 4.0.2, Glossary Assets in CPD could only publish events to an OMRS Cohort topic. Now the Glossary service can receive events as well so that glossary assets can be bi-directionally synchronized across multiple repositories. For example, a Data Engineer can now configure sync to: (1) receive create/update/delete events for glossary and technical assets from external repositories, and (2) save them as reference copies in WKC.

Support Watson Knowledge Catalog Core on Power Systems
Watson Knowledge Catalog is adding to its integration with Power Systems. These new compatibilities will be useful in bringing Watson Knowledge Catalog on Cloud Pak for Data closer to parity with IBM's legacy governance products.

Expanded Support for Platform Systems
Watson Knowledge Catalog will now support OpenShift version 4.8 and will have Systems Support for Yosemite.