Unified Data Access Policy Management and Enforcement with Watson Knowledge Catalog and Guardium Data Protection

View Only

Unified Data Access Policy Management and Enforcement with Watson Knowledge Catalog and Guardium Data Protection

By Walid RJAIBI posted Wed October 04, 2023 10:09 AM

Like

Walid Rjaibi

IBM Distinguished Engineer and CTO for Data Security

wrjaibi@ca.ibm.com

Introduction

IBM Watson Knowledge Catalog (WKC) allows users to discover, classify, curate, trace lineage, and manage data protection rules for data centrally and using a single pane of glass. Managing data protection rules centrally across the IT environment reduces the risk of inconsistent data access enforcement which typically arises when data protection rules are managed across a set of disparate and unconnected systems. It also reduces the cost of demonstrating compliance as data protection rules can be easily audited when they are defined in a single place and using a single vocabulary. Additionally, users can benefit from high level abstractions such as business terms and data classifications when defining data protection rules, enabling uniformity and consistency in data access enforcement across the organization.

While managing data protection rules centrally is critical, the secure and consistent enforcement of such rules is equally critical. This is exactly where Guardium Data Protection (GDP) helps. Unlike application-centric enforcement which takes place only for accesses that happen through the application, GDP is a gatekeeper sitting in front of the database ensuring that the data protection rules are uniformly enforced regardless of whether users are accessing the database through an application or directly through the database system’s specific interfaces (e.g., Db2 CLP). Therefore, it has become clear that unifying WKC’s data access policy management and GDP’s data access enforcement combines the benefits of both worlds: Central management of the data protection rules, combined with the secure and consistent enforcement of such rules.

WKC and GDP Integration Architecture

Figure 1 shows a high-level overview of the WKC and GDP integration architecture. When this integration is enabled on the GDP side, the processing steps can be summarized as follows:

1. The user or application submits an SQL statement to the database.

2. The Guardium STAP or E-STAP agent intercepts that SQL statement and requests a verdict from the Sniffer component which resides on a Guardium Collector.

3. The Sniffer component requests a decision from WKC[1], passing on the SQL statement’s context such as the user identity and the tables involved.

4. The WKC Decision Engine processes the request and returns a decision to Sniffer. The decision can be a deny, an allow, a column transformation, or a row transformation. A column transformation is returned when the content of the column must be masked before it can be returned to the user (dynamic data masking). A row transformation is returned when the query results set must be filtered before it can be returned to the user (row level filtering).

5. If the decision returned is a column transformation or a row transformation, the Sniffer component rewrites the SQL statement by injecting appropriate SQL constructs to implement dynamic data masking, row level filtering or both. Also, Sniffer caches the decision returned by WKC so the next time the same SQL statement and its associated context are seen, an actual trip to WKC is not required.

6. The Guardium STAP or E-STAP agent enforces the decision. If the decision was a column transformation or a row transformation, the SQL statement released to the database system is not the original statement that was submitted by the user. Rather, it is the SQL statement rewritten in step 6.

7. The SQL constructs injected in step 6 are executed by the database system itself, thus ensuring that the query results set returned obeys the data protection rules defined in WKC.

Figure 1: WKC and GDP Integration Architecture.

Enforcing Column and Row Transformations

Figure 2 shows an example of how an SQL statement that is submitted by a user named Joe is rewritten by the Guardium Sniffer component, assuming the following two data protection rules are defined in WKC:

· Data Protection Rule #1: User Joe is not allowed to see the content of the Social Security Number (SSN) column. That content must be masked when he queries that column.

· Data Protection Rule #2: User Joe is not allowed to see the full content of table T1. He can only see the rows for which the value in the column City is equal to ‘Toronto’.

As discussed in step 5 of the previous section, the Sniffer component injects a set of SQL constructs in the original SQL statement to enforce the data access rules defined in WKC. In this example, two SQL constructs are injected. First, a User-Defined Function (UDF) is injected as a wrapper around the SSN column. This UDF implements the desired masking function and must have already been registered in the database. WKC supports 3 types of masking functions that are enforced by GDP:

· Redact: All the characters in the data are replaced with X. For example, 752–721–3120 is replaced with XXXXXXXXXX.

· Substitute: The data is replaced with values that do not match the original format. For example, 752–721–3120 is replaced with 0c63xa8394011.

· Obfuscate: The data is replaced with values that preserve referential integrity and the original data format. For example, 752–721–3120 is replaced by 708-219-6250.

Figure 2: Query Rewrite Illustration.

The second construct injected by the Sniffer component is an SQL Predicate. More specifically, the SQL predicate is injected in the WHERE clause of the SQL statement to ensure the row filtering requirement expressed in Data Protection Rule #2 (City = ‘Toronto’) is enforced. Note that if the original SQL statement does not already include a WHERE clause, the Sniffer component automatically injects one.

The key advantage of enforcing column and row transformations through query rewrite is performance. Indeed, database systems have been optimized for decades to process SQL statements efficiently. Therefore, it made perfect sense to delegate the actual enforcement logic of these transformations to the database by rewriting the original query and injecting the appropriate UDF and predicates.

Deployment Considerations

At the time of writing this paper, the WKC and GDP integration is supported for 6 databases, namely Teradata, Oracle, SQL Server, MySQL, Postgres, and Hadoop/Hive.

The dynamic data masking feature requires the appropriate UDF to be registered in the desired database. The specific details around the UDF and how to install them for each database can be found in [1].

The databases protected by GDP and WKC must be configured to use the same user domain. For example, if a user ABC logs on to the database, then that user ABC must have the same meaning in WKC, which is where the data protection rules are defined. Sharing the user domain is a key prerequisite of the WKC-Guardium integration.

Conclusion

Managing data protection rules across disparate and unconnected systems increases the risk of inconsistent data access enforcement as such rules may easily get out of sync. It also increases the cost of demonstrating compliance as the data protection rules may need to be audited across many disparate systems. WKC solves this problem by enabling users to manage the data protection rules centrally and using a single vocabulary. The unification of WKC’s data access policy management with GDP enables users not only to benefit from managing data protection rules centrally, but also from enforcing such rules securely and consistently whether the database is accessed through an application or directly through the database system’s own interfaces (e.g., Db2 CLP). In today’s zero-trust world, this unification strengthen adherence to zero-trust security while reducing the cost of demonstrating compliance.

References

[1] Guardium-WKC UDF installation Guide: https://supportcontent.ibm.com/support/pages/node/6826047

[2] Data Protection Rules in Watson Knowledge Catalog:

https://medium.com/ibm-data-ai/data-protection-rule-dpr-in-watson-knowledge-catalog-8a19c3bc0959

[1] If any local Guardium rules are defined, they will be evaluated first. The request to WKC is made only when the local Guardium rules allow access. This feature allows users to, for example, quickly lock down database access in an emergency.

0 comments

18 views

IBM Security

Join our 16,000+ members as we work together to
overcome the toughest challenges of cybersecurity.

IBM Security Guardium