Anatomy of a data asset

By Inge Halilovic posted Mon August 01, 2022 04:48 PM

In Cloud Pak for Data, you can transform data into assets that accumulate meaning and value. A data asset is much more than just a data set!

When you first create a data asset, the asset has basic information about how to access the data, the table, schema, and data values. 

A data asset accumulates meaning

With the Watson Knowledge Catalog service on Cloud Pak for Data, you can run a curation process to add a layer of metadata to the data asset. During curation,  each column is automatically assigned a data class that represents the format of the data. Statistics about the about the values are compiled. Business terms are automatically assigned to each column to describe the semantic meaning of the data for your organization. You can also add business terms manually. Data quality is analyzed to identify problems. After you finish curation, you publish the asset into a catalog to share it with your organization. In the catalog, all the information added during curation is visible.

As users find the data asset in the catalog and use it in tools, they create the third layer of meaning that describes the history of how the asset is used, the lineage of the data, and the relationships between it and other assets. 

Here's a data asset in a catalog. You can see the Overview tab and the Information page. This asset is a table in a Db2 Warehouse. It has three business terms assigned to the asset and two relationships with other data assets.

On the Assets tab, you'll see a preview of the data. This tab shows just a sample of the data. When you use this data asset in a tool, for example, to train a model or in a DataStage pipeline, the entire data set will be fetched from the data source and loaded into the tool. Once the tool is finished running, the data is released. The next time you use the data in a tool, it is fetched again, and includes all the updates that were made to the data set in the meantime.

Each column has an eye icon next to it and if you click it, you'll see more information about that column. For example, the EMAIL_ADDRESS column has two business terms assigned to it. These business terms were assigned automatically during curation. These terms make it easy to find email addresses in all your data assets, regardless of what the column names are.

On the Profile tab, you'll see information about the values in each column. The quality score describes whether values match the data type and data class, missing values, uniqueness, and so on. You can create data quality rules and definitions to suit your data. If you click the eye icon, you can see the details of the quality score. The data class describes the format of the data in the column. Watson Knowledge Catalog has over 150 predefined data classes, but you can create your own as well.

On the Activities pane, you can see the history of the data asset. For each activity, you can view the details. For example, the previous and updated values of a property.

On the Ratings tab, you can see what the members of the catalog think about the data asset.

And finally, on the Lineage tab, you can see where data came from, how it was transformed, and where it was consumed.

