As organizations strive to create a data-driven culture, they face challenges in making data accessible, trustworthy, and easy to share. IBM Data Product Hub offers a solution by providing a data marketplace for every ecosystem. In this blog, we’ll explore how a data producer can create data products on IBM Data Product Hub using assets stored in a Lakehouse and governed through a third-party catalogue.
Creating a data product
To create a data product, data producers must first set up a connection between IBM Data Product Hub and the data store, such as watsonx.data or Snowflake. This can be done by adding connection details on the platform and selecting the assets to package as a data product. This step would lead to creation of a draft data product.

IBM Data Product Hub screen to set up connection to snowflake Lakehouse.
Extracting Metadata from Third-Party Catalogues
Once a draft data product is created, the user must enrich it with metadata from the catalogue. Assuming the organization uses a third-party catalogue like Informatica, Unity, or Collibra, the data producer must extract metadata for the relevant asset from the catalogue.
Let’s assume the organization is using Informatica governance catalogue. To find metadata for an asset in the Informatica, user can follow these steps:
- Understand the Informatica REST API: Familiarize yourself with the Informatica REST API to retrieve metadata for the asset.
- Authenticate and authorize with Informatica’s REST API: Use authentication to access the Informatica REST API and retrieve metadata.
- Search the Asset: Utilize the catalogue's API to search for the asset by its name or another identifier, taking advantage of Informatica’s search APIs to retrieve the asset based on specific criteria.
- Automation and Script - A user can automate the process of finding assets and extracting metadata from the Informatica REST API by writing a Python script utilizing the requests library.
- Mapping additional metadata from Informatica governance catalog on Data Product Hub: After extracting the enriched metadata from Informatica the third-party catalogue, the data producer can map it to the data product using IBM Data Product Hub’s custom metadata capability. This can be automated by setting up a job that runs a script to perform the mapping.
- Publishing data product: Once the data product is enriched, it can be published on the hub for organization-wide sharing.
You can find below a pseudo-code script that demonstrates a general flow for pulling out metadata from the catalogue API.
import requests
from requests.auth import HTTPBasicAuth
# Define your Informatica credentials
username = "your_username"
password = "your_password"
catalog_url = "Informatica-api-search"
# Step 1: Search for the asset by name
search_params = {'q': 'asset_name'}
response = requests.get(catalog_url, params=search_params, auth=HTTPBasicAuth(username, password))
if response.status_code == 200:
assets = response.json()['items']
asset_id = assets[0]['id'] # Get the first asset ID
# Step 2: Retrieve metadata for the asset
metadata_url = f"Informatica API get metadata link/{asset_id}"
metadata_response = requests.get(metadata_url, auth=HTTPBasicAuth(username, password))
if metadata_response.status_code == 200:
metadata = metadata_response.json()
print("Asset Metadata:", metadata)
else:
print("Failed to retrieve metadata:", metadata_response.text)
else:
print("Failed to search for asset:", response.text)
This approach enables users to create data products quickly from any catalogue enabling IBM Data Product Hub to be technology agnostic. This result in fostering greater confidence in the quality and governance of the data, ultimately leading to improved business outcomes. To learn more about the custom metadata capability, refer to our previous blog.
If your data teams are struggling with data accessibility, quality issues, or slow reporting cycles, register for our webinar series where we will discuss data sharing best practices to help you unlock the full potential of your data. You can also take a free trial of Data Product Hub or book a live demo with our experts to supercharge your data-driven outcomes.