Chapter 2: Hierarchical Reference data sets and cross walks
In the previous chapter [chapter 1] we learnt about Reference Data Management and using it in Watson Knowledge catalog at foundational level. In this chapter we will explore and learn about hierarchies in Reference Datasets. Also, we will learn about relationships between values of different reference data sets known as value mappings or cross walks.
Hierarchical Reference Data:
The Reference data values, as learnt is previous chapter, is a list. While being a list these values can be further organized in a hierarchical manner for easy and better structuring of these values. For instance consider the NAICS codes in which each industry has a generic code and then it narrows down into specific type within the industry. In the image below, code 11 represents the Agriculture, Forestry, Fishing and Hunting industry. code 1111 falling under it represents specific sub-industry. So on 111110 represents sub industry within 1111.
All these values are valid NAICS codes and follow a structure to represent the entity which is hierarchical in nature.
Lets import these values into Watson knowledge catalog and see how it looks like
- Create a reference dataset named NAICS
- Import into the set NAICS values from csv file
After selecting the file to upload, specify the code, value and description columns. A special column to note is parent column. You can use the parent column to specify which value within the set forms the parent value to represent them in a hierarchical manner. Below image shows selection of the parent column while importing file.
On importing the values the reference data set will looks as in image below
As you can see the values are in a hierarchical manner for easy navigation through the industry information it represents.
Similar to structuring values in a hierarchical way we can represent the reference data sets themselves in a hierarchical manner. To build the hierarchy on the reference data set, access tab Set-Level Hierarchy and get started with adding parent and child reference data sets.
Image below shows Countries reference data set having two child reference data sets, namely Indian states and Currency codes.
Reference data set Cross Walks:
While reference data sets can be represented in a hierarchical manner, another interesting and useful representation is to associate values in one reference data sets to values in another set. This type of association is called value mappings or cross walks.
This mappings is useful to represent relationship between values. This typically can be used to represent the relationship between reference data values in a hub and spoke model fashion.
For example, we can map value INDIA from countries data set to each of the states with the Indian states reference data set. Like-wise we can map with INDIA the currency code IND from the currencies reference data set.
Let’s try this out in Watson knowledge catalog. To associate a value with another, access the Related Values section of the value and try adding a relation.
On dialog that opens up choose the reference data set from which to associate a value.
In the next page choose the type of relationship to be used.
One-to-One : Choosing this type will ensure that only one value from one data sets is associated with only one value from another data set. A 1:1 relationship
One-to-many: This options values to map value with multiple values from another reference data sets. A m:n relationship between the two data sets.
The Related values will show up as in image below.
You can import these associations too via csv file from the menu Upload Related Values. In the dialog box that opens up choose the appropriate columns and target reference data set to complete the proper mappings for respective values.
In this chapter, we learnt about hierarchies and cross walks in reference data sets. Along with this, we walked through how to represent this in Watson Knowledge Catalog’s reference data management system.
In next chapter we will look into custom columns support which will help represent additional information about a value. Also, we will know about API way of access to the reference data management system.