Data Lifecycle - Integration and Governance

 View Only

Customizable query as a data product in IBM Data Product Hub

By Alan Zhang posted Wed January 22, 2025 11:17 AM

  

Reviewed by: Jenna Lau Caruso

What is a customizable query as a data product and why it is useful?

IBM Data Product Hub (DPH) is a self-service solution that is used by data-driven enterprises to share data products. In DPH, data producers can publish curated data products to share with data consumers in their community. Data consumers can easily access data products for their business needs.

Data products can contain one or more data or data-related assets. They are curated, packaged, and distributed to be easily accessible and reusable. A customizable query as a data product is useful to share a horizontal or vertical slice of a table as a data product, for example:

  • Sharing only the selected columns of a table in a data product, excluding columns that are either not useful to consumers or contain sensitive information.
  • Sharing only the selected rows from a table in a data product, tailoring the data shared to a more specific use case

While the example is applicable to general queries, DPH adds additional value with its customizable query capability by providing additional customization to consumers to further tailor the data they receive.

For example, a data product may contain global sales data, but sharing a customizable query can allow a consumer to request data from a specific region. I have a table which contains customers from the United States, but I know that my consumer community is usually looking for local data and does not require the full data set. To serve the needs of the consumer community, I can create a dynamic view of the asset with filters of state and city which will allow a consumer to request only the data required for his/her use case.

This blog demonstrates an end-to-end workflow for DPH. First we will show how a data producer can create a customizable query as a data product from a connection to IBM watsonx.data and publish it. Then we will see how data consumers can access the data by subscribing to the product and using parameters in their query to get the specific data delivered that they need.

1. Data producer creates a dynamic view asset:

1.1 Login to IBM Cloud Pak for Data as a data producer (deng26), create a CPD project, WXD_DPH_Project, then perform the following steps in the project.

1.2 Create a parameter set asset, ParameterSet4DPH, with parameters: State Name (default value: California); City Name (default value: San Jose)

1.3 Create a connection of type watsonx.data presto, wxd_conn

1.4 Create a dynamic view of data asset with SQL SELECT statement, using STATE and CITY parameters in WHERE clause:

2. Data producer creates a customizable query as a data product

2.1 With the same user deng26, go to DPH, click “Create data product”, select “Add from project”, select the dynamic view of data asset created previously, then click “Create draft”.

2.2 Complete the data product by choosing: Primary business domain (Sales), Access level (Requires approval), Data contract, etc. Refer to creating a data product for further details.

2.3 Publish the data product.

3. Data consumer discovers, subscribes to the product and gets the delivery

3.1 As a data consumer (dsc26) using Cloud Pak for Data, go to DPH, scroll down to “Browse by business domain and use case”, then click on the “Sales” tile and find the product needed.

3.2 Subscribe to this product, choose a delivery method; provide the required parameters (e.g., State Name: Texas, City Name: Dallas); agree to the data contract, etc.

3.3 After approval of the access request, the user will get the product delivered. Then the user can access the data either by downloading it or access it using Apache Arrow Flight service in a Python notebook. Here is a sample CSV file downloaded and displayed locally with data filtered for customers from Dallas, Texas.

Conclusion:

IBM Data Product Hub enables data producers to create data products using customizable queries enabling consumers to properly scope their data using custom values. This “customizable query as a product” gives data consumers added flexibility to subscribe to a product using custom values and to access data required for specific needs.

References:


#CloudPakforData

0 comments
26 views

Permalink