Watson Knowledge Catalog (WKC) - Data Governance and Quality

Reference Data Management in Watson Knowledge Catalog

By PRAVEEN DEVARAO posted Tue February 16, 2021 12:20 AM

  

Chapter 1: Introduction to reference Data Management in Watson Knowledge Catalog
approx read time: 10 min

            Reference data is a collection of values which is typically used for categorizing or classification of other assets within an organization. This collection of values are static in nature i.e. do not change frequently over time. Managing this collection in a global repository is needed to achieve standardization across the organization which in turn will help all the sub-systems follow the same terminology hence aiding same understanding across the board. Examples of Reference data set can be ISO Country codes, ICD10 Health codes, NAICS codes etc.

 CPD dashboard

            In this chapter we will know about Reference data management capability in Watson Knowledge Catalog [WKC]. WKC is a data catalog tightly integrated with data governance capabilities providing tool and constructs for a self-service data governance model. WKC helps users to quickly discover, curate, categorize and share data assets, data sets, analytical models and their relationships with other members of your organization (https://ibm.biz/BdfE5a).

            To access the reference data management feature login to your IBM Cloud Pak for Data instance and from the left hand navigation bar access Reference Data under the Governance section 

            Once you land on to the Reference Data page you will get to see list of all published and list of Draft reference data sets defined in the system. To start with, the list will be empty and one can create a new reference data set from the button `Add Reference Data set` -> `New Reference Data Set`

 

            At minimum key-in the name for reference data set and select the primary category in which the reference data set is to be created and click create. 

Extras: Category can be seen as your operating system folder under which you can organize different governance artifacts of WKC. Along with organizing artifacts one can provide  permissions to a user or group of users on the category which will implicitly apply to all artifacts within it. https://ibm.biz/BdfrSV

            On creation of the data set you will be directed to the newly created set into which you can start adding the values. The created dataset will be in draft state which you can move to published state once it is ready for consumption by other users on the system.

  

            Extras: WKC by default provides workflow support on all governance artifacts using which you can work on draft copy of the artifacts before making it available for consumption by other users. You can configure the workflow system to make an artifact go through review and approval process before publishing it. 

            You can add values individually accessing the edit menu available on the page or import the list of values from a csv file.

            Each value within the set will have 3 primary fields namely code, value and description. Code identifies the value uniquely, for e.g. unique ISO code of the Indian state in the Indian-states data set. Value, used to specify a quick gist of what this code represents, going with country code as example this could be name of the country. Description, this field will allow to capture an elaborate information related to the reference data value.

           If importing from the csv file, select which columns of csv forms the code, value and description of the value list within the set and click on save.

 

            The format of a csv file will be as in the image below. The first row in the image below is a header and will not be imported into the set.

  

            On importing from the file or adding values individually, the reference data set will look as in the screen shot below and can be moved through the next workflow states to publish it. In screen shot below Send for Approval is to be invoked for necessary approvals before publishing

 

            Given the reference data set is part of the data governance platform you can further associate the values or the entire data set [From overview tab] to other governance constructs like Business Terms [https://ibm.biz/BdfENh], classifications [https://ibm.biz/BdfENV] etc. Similarly you can assign stewards, set effectives dates for the reference data set and assign tags for easy findability.

            In this chapter we learn’t what is reference data, how to create, modify and publish it on the WKC platform. In next chapter we will look at Hierarchical Reference Data Sets and Hierarchical values.


#Featured-area-2-home
#Featured-area-2
1 comment
886 views

Permalink

Comments

Tue February 16, 2021 10:50 PM

Very concise explanation and walkthrough Very concise explanation and walkthrough 
Very concise explanation and walkthrough