Data Lifecycle - Integration and Governance

 View Only

LinkedIn Share on LinkedIn

Row-Filtering in IBM watsonx.data (with IBM Knowledge Catalog)

By Lisa Mallahan posted Wed January 22, 2025 09:28 AM

  
Reviewed by: Bob Neugebauer

Purpose:

In this article, we’ll see how IBM watsonx.data tables can be added to IBM Cloud Pak for Data governed catalogs and how we can apply data protection rules to filter the table data, specifying which rows of a table will be visible to various users — both in IBM Cloud Pak for Data and IBM watsonx.data.

Requirements:

  • IBM Cloud Pak for Data 5.0.3+ (with IBM Knowledge Catalog)
  • IBM watsonx.data 2.0.3+

Audience:

This article is intended for users who have familiarity with IBM Cloud Pak for Data and IBM Knowledge Catalog, and who are interested in integration with IBM watsonx.data.

What is row-filtering?

IBM Knowledge Catalog (a component of IBM Cloud Pak for Data) offers three types of data protection:

  • allow-/deny-access, where data protection rules determine who is and is not allowed access to an asset
  • masking, where column values are redacted, substituted, or obfuscated
  • row-filtering, where asset rows are included or excluded when viewed by particular users, based on values in a row’s data

These protections are available to any tables that are included in an IBM Cloud Pak for Data governed catalog.

For example, if we have a table with worldwide sales data in an IBM Cloud Pak for Data governed catalog, we might want to restrict particular rows of that table, allowing access only to certain users. We can create a row-filtering rule that identifies the asset(s), specifies some set of users, and defines which rows should be presented to those users.

In this example, when users who are not included in the ‘European data access’ group preview any table in the ‘worldwide_sales’ schema, any rows that have a ‘region’ value of ‘Europe’ will be excluded from the presentation.

Our data steward user is not in the ‘European data access’ group. This user sees data from the ‘Americas’ region and does not see data from the ‘Europe’ region — and gets a notation in the table header indicating that row-filtering is in effect.

Our data engineer user is included in the access group. This user sees data from both ‘Americas’ and ‘Europe’, with no row-filtering in effect.

How does this work with watsonx.data?

While allow-/deny-access rules and masking rules were both supported in IBM watsonx.data with the introduction of IBM watsonx.data-Knowledge Catalog integration (in IBM Cloud Pak for Data 4.8.4), IBM watsonx.data support for row-filtering was implemented in the 5.0.3 timeframe. Prior to this, IBM watsonx.data did not allow preview of any tables that had row-filtering rules applied to them.

IBM Knowledge Catalog’s data protection is available in IBM watsonx.data through a service integration. To set this up, an IBM watsonx.data Admin user creates the integration on the Access Control -> Integrations page.

This integration will apply only to the specified IBM watsonx.data catalogs. Any tables within these catalogs will be subject to IBM Knowledge Catalog’s data protection — if the tables are also present in an IBM Knowledge Catalog governed catalog.

In this example, our IBM watsonx.data table is included in the iceberg_data catalog, which happens to be named in our integration.

The table has also been added to our Worldwide Sales Data governed catalog in IBM Cloud Pak for Data, via an IBM watsonx.data Presto connection.

We’ve already seen row-filtering in IBM Cloud Pak for Data. What does this look like in IBM watsonx.data?

To preview table data in IBM watsonx.data, we navigate to Data Manager -> Browse Data, expand the catalog and schema, and select the table. In the main frame, we click Data Sample.

Users who are not included in the ‘European data access’ group see filtered data (only the table rows that are specific to the Americas region).

While users who are included in the access group see all rows in the table (rows in both the Americas and Europe regions).

Additionally, we can see table data by running an SQL query in IBM watsonx.data. To do this, we navigate to the Query Workspace, and issue a SELECT statement:

SELECT * FROM “iceberg_data”.”worldwide_sales”.”customer_data” LIMIT 25

When this is done by a user not in the ‘European data access’ group, the query returns row-filtered data — only rows that show Americas data.

While the same query issued by a user in the ‘European data access’ group returns unfiltered data — rows that show data from both the Americas and the Europe regions.

In summary…

Row-filtering rules are a versatile and important means of data protection in IBM Cloud Pak for Data, allowing different subsets of data to be visible to different sets of users.

Early versions of IBM watsonx.data-Knowledge Catalog integration provided the ability to enforce masking and access rules in IBM watsonx.data, but did not support row-filtering rules. With the introduction of IBM Cloud Pak for Data 5.0.3, IBM watsonx.data is able to enforce row-filtering, now offering complete support for IBM Knowledge Catalog’s data protection methods.

Additional resources

For more information on various topics mentioned in this article, see the IBM Cloud Pak for Data documentation.

0 comments
4 views

Permalink