Watson Knowledge Catalog (WKC) - Data Governance and Quality

 View Only

What's New in Watson Knowledge Catalog for CPD v4.0.3

By Corey Keyser posted Thu November 18, 2021 05:26 PM

  
The Cloud Pak for Data v4.0.3 release will see some significant new changes to Watson Knowledge Catalogs core functionality within our Business Glossary and Catalog. These changes -- including Custom Category Roles and Catalog Specific Machine Learning Term Assignment -- will come along with the first phases of some exciting new governance features within Watson Knowledge Catalog's Data Policy Service. Read on for more details.


Custom Category Role Definition and Policy Enforcement
WKC admins can define new custom category collaborator roles and tailor the category permissions associated with the custom role. The custom role can then be assigned users or user groups in the category, who will then gain the fine-grained permissions associated with the role. These custom roles can also be leveraged in workflow step assignments. Within SaaS, standard plans can create up to 5 custom roles, while Professional and Enterprise can create up to 50 (Lite plans unsupported).


Catalog-Specific Machine Learning for Term Assignment in Quick Scan
Watson Knowledge Catalog now supports multiple, catalog-specific ML models to be used for automatic business term assignment. Rather than using a single, global model for automatically assigning business terms to data, customers can now accelerate time to value and the accuracy of their results using different models based on specific data domains. Quick Scan now supports users to have much greater control over creating, retraining, and transferring the prediction algorithms in WKC. These models are built specifically for projects and catalogs and they can be saved, transferred, and loaded via API. Training is managed by the WKC Machine Learning API and will be available on both SaaS and CPD on-prem.


Duplicate Asset Handling
Catalog collaborators with Admin role can specify the default behavior on how duplicate assets are to be handled automatically when adding or publishing assets to the catalog.  
There are multiple options for specifying how to handle duplicate assets in a catalog, all of which you can change at any time from the catalog Settings page:
  1. Update original assets - Update the values of the original assets with the values of the new assets. If the new assets have empty values, the corresponding values from the original assets are retained. The privacy setting, asset owner, asset members, and activities of the asset are not affected.
  2. Overwrite original assets - Overwrite all values of the original assets with the values of the new assets, with the exception of privacy setting, asset owner, asset members, and activities of the asset.
  3. Allow duplicates - Add the new assets as duplicates of the original assets. (This is the default behavior.)
  4. Preserve original assets and reject duplicates - Reject the new duplicate assets and preserve the original assets


API to Publish an Asset from Project to Catalog with Option to Handle Duplicate Assets
The "publish asset" API has a new duplicate_action option to specify the behavior if the asset being published has a duplicate in the target catalog. 
- 'IGNORE' will ignore the duplicate and create a new asset.
- 'REJECT' will return an error and no asset will be created.
- 'UPDATE' will update the best matched duplicate with the input values according to the predefined rules.  The privacy setting, asset owner, asset members, and activities of the asset are not affected.
- 'REPLACE' will overwrite all values of the existing asset, with the exception of privacy setting, asset owner, asset members, and activities of the asset.
- No value means the duplicate_action specified in catalogs/projects/spaces will be used.


Enhancements to Importing and Exporting Governance Artifacts
  • Artifact identifiers are used to define the relationship between artifacts. This will replace WKCs previous method that relied on the artifact's context and name to define the relationship. This change ensures that the artifacts are identified consistently. 
  • A new glossary REST API will allow users to export all governance artifacts to a single ZIP file or import them all at once. 
  • The previously mentioned custom category roles can control who has permissions to import or export governance artifacts using a ZIP file.


Improved Glossary Import Performance and Resiliency
We have improved on multiple processes within Watson Knowledge Catalog that will considerably improve the time required to perform glossary import.


Broadened Support in Metadata Import
Watson Knowledge Catalog users will now be able to import metadata from both Salesforce and Netezza.


Additional Bi-Directional Synchronization Connections
WKC users will be now be able to use Egeria sync for connections and assets from Amazon S3, Microsoft Azure Data Lake Store, Microsoft Azure File Storage, Microsoft Azure Blob Storage, and SAP HANA.


Multitenancy Support
Watson Knowledge Catalog will join the rest of the Cloud Pak for Data platform in support multiple mechanisms for achieving service multitenancy. For more information view our documentation here.



0 comments
30 views

Permalink