Watson Knowledge Catalog (WKC) - Data Governance and Quality

Reference Data Management in Watson Knowledge Catalog – Chapter 2

By PRAVEEN DEVARAO posted Tue February 23, 2021 10:25 AM

  

Chapter 2: Hierarchical Reference data sets and cross walks

 

In the previous chapter [chapter 1] we learnt about Reference Data Management and using it in Watson Knowledge catalog at foundational level. In this chapter we will explore and learn about hierarchies in Reference Datasets. Also, we will learn about relationships between values of different reference data sets known as value mappings or cross walks.

 Hierarchical Reference Data:

The Reference data values, as learnt is previous chapter, is a list. While being a list these values can be further organized in a hierarchical manner for easy and better structuring of these values. For instance consider the NAICS codes in which each industry has a generic code and then it narrows down into specific type within the industry. In the image below, code 11 represents the Agriculture, Forestry, Fishing and Hunting industry. code 1111 falling under it represents specific sub-industry. So on 111110 represents sub industry within 1111.

 

All these values are valid NAICS codes and follow a structure to represent the entity which is hierarchical in nature.

Lets import these values into Watson knowledge catalog and see how it looks like

  1. Create a reference dataset named NAICS
  2. Import into the set NAICS values from csv file



    After selecting the file to upload, specify the code, value and description columns. A special column to note is parent column. You can use the parent column to specify which value within the set forms the parent value to represent them in a hierarchical manner. Below image shows selection of the parent column while importing file.

 

On importing the values the reference data set will looks as in image below

 

As you can see the values are in a hierarchical manner for easy navigation through the industry information it represents.

Similar to structuring values in a hierarchical way we can represent the reference data sets themselves in a hierarchical manner. To build the hierarchy on the reference data set, access tab Set-Level Hierarchy and get started with adding parent and child reference data sets.

Image below shows Countries reference data set having two child reference data sets, namely Indian states and Currency codes.

 

Reference data set Cross Walks:

While reference data sets can be represented in a hierarchical manner,  another interesting and useful representation is to associate values in one reference data sets to values in another set. This type of association is called value mappings or cross walks.

This mappings is useful to represent relationship between values. This typically can be used to represent the relationship between reference data values in a hub and spoke model fashion.

For example, we can map value INDIA from countries data set to each of the states with the Indian states reference data set. Like-wise we can map with INDIA the currency code IND from the currencies reference data set.

Let’s try this out in Watson knowledge catalog. To associate a value with another, access the Related Values section of the value and try adding a relation.

 

On dialog that opens up choose the reference data set from which to associate a value.

  

In the next page choose the type of relationship to be used. 

One-to-One : Choosing this type will ensure that only one value from one data sets is associated with only one value from another data set. A 1:1 relationship

One-to-many: This options values to map value with multiple values from another reference data sets. A m:n relationship between the two data sets.

 

The Related values will show up as in image below.

 

You can import these associations too via csv file from the menu Upload Related Values. In the dialog box that opens up choose the appropriate columns and target reference data set to complete the proper mappings for respective values.

 

            In this chapter, we learnt about hierarchies and cross walks in reference data sets. Along with this, we walked through how to represent this in Watson Knowledge Catalog’s reference data management system.

            In next chapter we will look into custom columns support which will help represent additional information about a value. Also, we will know about API way of access to the reference data management system.

2 comments
20 views

Permalink

Comments

Tue July 13, 2021 08:28 AM

Hi @Ariel Cohen

Thanks for reading the Post. You can find the dataclass related article at Chapter 3: Data Classes .

Your observation is right the dataclass works out of code column only. Reference Data sets usually are looked up via the code column which is unique....value can be repeated. Given this characteristic of the reference data sets the dataclass ​too works out of the Unique column which is code only.

The catch of supporting values too could be that different teams can lean towards using different columns [some code and some value] in their actual tables. This could hinder standardization across the board.

I don't see this requirement on the list. Might be you can request for with justification of why will we need to have a dataclass out of values too. . IBM Product management will triage it.

Regards

Praveen Devarao

Tue July 13, 2021 05:25 AM

Thanks for the very useful article. A blog post for linking a data class to a reference data set would be a great follow up. For example, this only seems to work for codes at the moment. I.e. you cannot create a data class that looks at the values of a reference data set. Will this functionality be added?