Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

View Only

Back to Blog List

Row-Filtering in IBM watsonx.data (with IBM Knowledge Catalog)

By Lisa Mallahan posted Wed January 22, 2025 09:28 AM

Reviewed by: Bob Neugebauer

Purpose:

In this article, we’ll see how IBM watsonx.data tables can be added to IBM Cloud Pak for Data governed catalogs and how we can apply data protection rules to filter the table data, specifying which rows of a table will be visible to various users — both in IBM Cloud Pak for Data and IBM watsonx.data.

Requirements:

IBM Cloud Pak for Data 5.0.3+ (with IBM Knowledge Catalog)
IBM watsonx.data 2.0.3+

Audience:

This article is intended for users who have familiarity with IBM Cloud Pak for Data and IBM Knowledge Catalog, and who are interested in integration with IBM watsonx.data.

What is row-filtering?

IBM Knowledge Catalog (a component of IBM Cloud Pak for Data) offers three types of data protection:

allow-/deny-access, where data protection rules determine who is and is not allowed access to an asset
masking, where column values are redacted, substituted, or obfuscated
row-filtering, where asset rows are included or excluded when viewed by particular users, based on values in a row’s data

These protections are available to any tables that are included in an IBM Cloud Pak for Data governed catalog.

For example, if we have a table with worldwide sales data in an IBM Cloud Pak for Data governed catalog, we might want to restrict particular rows of that table, allowing access only to certain users. We can create a row-filtering rule that identifies the asset(s), specifies some set of users, and defines which rows should be presented to those users.

*Row-filtering rule in IBM Cloud Pak for Data*

In this example, when users who are not included in the ‘European data access’ group preview any table in the ‘worldwide_sales’ schema, any rows that have a ‘region’ value of ‘Europe’ will be excluded from the presentation.

Our data steward user is not in the ‘European data access’ group. This user sees data from the ‘Americas’ region and does not see data from the ‘Europe’ region — and gets a notation in the table header indicating that row-filtering is in effect.

*Row-filtered data in IBM Cloud Pak for Data*

Our data engineer user is included in the access group. This user sees data from both ‘Americas’ and ‘Europe’, with no row-filtering in effect.

*Unfiltered data in IBM Cloud Pak for Data*

How does this work with watsonx.data?

While allow-/deny-access rules and masking rules were both supported in IBM watsonx.data with the introduction of IBM watsonx.data-Knowledge Catalog integration (in IBM Cloud Pak for Data 4.8.4), IBM watsonx.data support for row-filtering was implemented in the 5.0.3 timeframe. Prior to this, IBM watsonx.data did not allow preview of any tables that had row-filtering rules applied to them.

IBM Knowledge Catalog’s data protection is available in IBM watsonx.data through a service integration. To set this up, an IBM watsonx.data Admin user creates the integration on the Access Control -> Integrations page.

*Creating an IBM watsonx.data-Knowledge Catalog integration*

This integration will apply only to the specified IBM watsonx.data catalogs. Any tables within these catalogs will be subject to IBM Knowledge Catalog’s data protection — if the tables are also present in an IBM Knowledge Catalog governed catalog.

In this example, our IBM watsonx.data table is included in the iceberg_data catalog, which happens to be named in our integration.

*IBM watsonx.data table in iceberg_data catalog and worldwide_sales schema*

The table has also been added to our Worldwide Sales Data governed catalog in IBM Cloud Pak for Data, via an IBM watsonx.data Presto connection.

*IBM watsonx.data table added to IBM Cloud Pak for Data governed catalog*

We’ve already seen row-filtering in IBM Cloud Pak for Data. What does this look like in IBM watsonx.data?

To preview table data in IBM watsonx.data, we navigate to Data Manager -> Browse Data, expand the catalog and schema, and select the table. In the main frame, we click Data Sample.

Users who are not included in the ‘European data access’ group see filtered data (only the table rows that are specific to the Americas region).

*Row-filtered data in IBM watsonx.data’s Data Manager*

While users who are included in the access group see all rows in the table (rows in both the Americas and Europe regions).

*Unfiltered data in IBM watsonx.data’s Data Manager*

Additionally, we can see table data by running an SQL query in IBM watsonx.data. To do this, we navigate to the Query Workspace, and issue a SELECT statement:

SELECT * FROM “iceberg_data”.”worldwide_sales”.”customer_data” LIMIT 25

When this is done by a user not in the ‘European data access’ group, the query returns row-filtered data — only rows that show Americas data.

*Row-filtered data in IBM watsonx.data’s Query Workspace*

While the same query issued by a user in the ‘European data access’ group returns unfiltered data — rows that show data from both the Americas and the Europe regions.

*Unfiltered data in IBM watsonx.data’s Query Workspace*

In summary…

Row-filtering rules are a versatile and important means of data protection in IBM Cloud Pak for Data, allowing different subsets of data to be visible to different sets of users.

Early versions of IBM watsonx.data-Knowledge Catalog integration provided the ability to enforce masking and access rules in IBM watsonx.data, but did not support row-filtering rules. With the introduction of IBM Cloud Pak for Data 5.0.3, IBM watsonx.data is able to enforce row-filtering, now offering complete support for IBM Knowledge Catalog’s data protection methods.

Additional resources

For more information on various topics mentioned in this article, see the IBM Cloud Pak for Data documentation.

Predefined user roles and permissions: https://www.ibm.com/docs/en/software-hub/5.1.x?topic=users-predefined-roles-permissions-in-software-hub
Working with data protection rules: https://www.ibm.com/docs/en/cloud-paks/cp-data/5.1.x?topic=artifacts-data-protection-rules
Working with governed catalogs: https://www.ibm.com/docs/en/cloud-paks/cp-data/5.1.x?topic=governance-catalogs
Integrating IBM watsonx.data and IBM Knowledge Catalog: https://ibmdocs-test.dcs.ibm.com/docs/en/SSDZ38_2.0.x_test?topic=integrations-integrating-knowledge-catalog
Creating an IBM watsonx.data Presto connection in IBM Cloud Pak for Data: https://www.ibm.com/docs/en/cloud-paks/cp-data/5.1.x?topic=connectors-watsonxdata-presto-connection

0 comments

6 views

Permalink

https://community.ibm.com/community/user/blogs/lisa-mallahan/2025/01/22/row-filtering-in-ibm-watsonxdata-with-ibm-knowledg

Cloud Pak for Data

Cloud Pak for Data

Row-Filtering in IBM watsonx.data (with IBM Knowledge Catalog)

By Lisa Mallahan posted Wed January 22, 2025 09:28 AM

Reviewed by: Bob Neugebauer

Purpose:

Requirements:

Audience:

What is row-filtering?

How does this work with watsonx.data?

We’ve already seen row-filtering in IBM Cloud Pak for Data. What does this look like in IBM watsonx.data?

In summary…

Additional resources

Permalink

Additional
Resources

Office

Quick Links

Cloud Pak for Data

Cloud Pak for Data

Row-Filtering in IBM watsonx.data (with IBM Knowledge Catalog)

By Lisa Mallahan posted Wed January 22, 2025 09:28 AM

Reviewed by: Bob Neugebauer

Purpose:

Requirements:

Audience:

What is row-filtering?

How does this work with watsonx.data?

We’ve already seen row-filtering in IBM Cloud Pak for Data. What does this look like in IBM watsonx.data?

In summary…

Additional resources

Permalink

Additional Resources

Office

Quick Links

Additional
Resources