In Cloud Pak for Data, you can transform data into assets that accumulate meaning and value. A data asset
is much more than just a data set!
When you first create a data asset, the asset has basic information about how to access the data, the table, schema, and data values.
With the Watson Knowledge Catalog service on Cloud Pak for Data, you can run a curation process to add a layer of metadata to the data asset. During curation, each column is automatically assigned a data class
that represents the format of the data. Statistics about the about the values are compiled. Business terms
are automatically assigned to each column to describe the semantic meaning of the data for your organization. You can also add business terms manually. Data quality is analyzed to identify problems. After you finish curation, you publish the asset into a catalog to share it with your organization. In the catalog, all the information added during curation is visible.
As users find the data asset in the catalog and use it in tools, they create the third layer of meaning that describes the history of how the asset is used, the lineage of the data, and the relationships between it and other assets.
Here's a data asset in a catalog. You can see the Overview
tab and the Information
page. This asset is a table in a Db2 Warehouse. It has three business terms assigned to the asset and two relationships with other data assets.
Each column has an eye icon next to it and if you click it, you'll see more information about that column. For example, the EMAIL_ADDRESS column has two business terms assigned to it. These business terms were assigned automatically during curation. These terms make it easy to find email addresses in all your data assets, regardless of what the column names are.
On the Profile
tab, you'll see information about the values in each column. The quality score describes whether values match the data type and data class, missing values, uniqueness, and so on. You can create data quality rules and definitions to suit your data. If you click the eye icon, you can see the details of the quality score. The data class describes the format of the data in the column. Watson Knowledge Catalog has over 150 predefined data classes, but you can create your own as well.
On the Activities
pane, you can see the history of the data asset. For each activity, you can view the details. For example, the previous and updated values of a property.
On the Ratings
tab, you can see what the members of the catalog think about the data asset.
And finally, on the Lineage
tab, you can see where data came from, how it was transformed, and where it was consumed.
Still curious? Sign up for a free trial of the Data privacy and governance use case
of the Data fabric solution with Cloud Pak for Data as a Service and run through the Data privacy and governance tutorials