Global Data Lifecycle - Integration and Governance

 View Only

Simplify data sharing across 3rd party catalog ecosystems with IBM Data Product Hub

By Aman Varma posted Tue November 26, 2024 09:07 AM

  

Organizations often struggle to share data across departments because different departments use various data governance tools and standards. This fragmentation leads to inconsistencies in data formats, quality, metadata and security policies, making it difficult to integrate or share information smoothly. In addition to that as different tool uses different metadata definition it gets difficult to share the necessary information across the different domain. As a result, data silos emerge, causing inefficiencies, delays in decision-making, and increased risk of non-compliance. Aligning governance practices and adopting interoperable tools can help overcome these challenges, fostering better collaboration and data-driven insights across the organization.

A single data marketplace can help resolve challenges by providing a unified platform for data sharing, and discovery across an organization. It enables the organization to publish data product in a single platform from different departments, ensuring consistency in formats, quality, and delivery standards. This eliminates data silos and facilitates seamless collaboration, enabling teams to access trusted data without the friction of disparate tools. Additionally, a well-governed marketplace enforces compliance and data privacy policies, reducing risks while enhancing transparency. Ultimately, it fosters a more efficient, data-driven environment, supporting better decision-making and innovation. Let us explore a data sharing solution from IBM that can help share data products seamlessly across disparate data tools.

IBM’s approach to streamline data sharing

IBM Data Product Hub enables data producers to create, manage, and curate data products with key attributes such as business domain, access level, delivery methods, recommended usage, and data contracts (terms and conditions). The data contract feature ensures that data is shared with consumers in a governed and transparent way. By packaging assets as reusable data products, data producers no longer need to repeatedly fulfill similar data requests, resulting in improved efficiency. This approach also accelerates access to high-quality data for consumers, reducing the wait time for needed data by eliminating the need for the consumer to find the owner of the data.


IBM Data product hub ability to support IBM and 3rd party ecosystem 

IBM Data Product Hub is agnostic to the underlying data management technology which means that it allows data producer to package and share data product having assets stored in IBM and 3rd party ecosystem (Snowflake, Databricks, informatica, Collibra etc). A data provider can let IBM Data Product Hub sit on top of their existing architecture and can start sharing data product across the organization with ease. Before we dive deeper into how the data product hub enables data product creation from third party catalog and help overcoming the pain point of storing sharing different definitions of metadata from these third-party tools, let’s have a brief understanding on how the data product hub describes its data product metadata model:

  • IBM Data Product Hub define the data assets structure in a data product through a defined set of metadata. When pulling data directly from a source, Data Product Hub pulls the technical metadata from the column like you see in the screen shot (column name, data type). When pulling from catalog (IBM Knowledge catalog now), IBM Data Product Hub will additionally pull business metadata, including quality score, user-friendly descriptions, business terms and data classes. For eg when data product hub pulls in the metadata from snowflake, the platform would pull in information like column name, quality score, data type, description, business terms and data class. Please note Data product hub would pull in this information if the asset in enriched and the metadata from the source catalog/data store is mapped to data product hub metadata definition. This is done so that the metadata get displayed on the hub in business consumable manner.

    • Image shows the data asset metadata structure define on data product hub

  • If there is additional metadata that the data producer wants to pull over about a table which does not exist in our data_asset model, then they would need to extend our data product part using the custom metadata capability. This would be a one-time step that an admin does on data product hub for the respective data source (IBM or third-party ecosystem).

To summarize, we have understood the metadata model used by Data Product Hub and the challenges of sharing data assets across different domains and departments, especially when each department uses a distinct technology stack such as data catalogs or Lakehouse. In my next blog, I will dive deeper with an example on how IBM Data Product Hub can support third-party tools like data catalogs, enabling the creation and sharing of data products across the organization in a governed and efficient manner.

If your data teams struggling with data accessibility, quality issues or slow reporting cycles? Register for our webinar series or Want to gain hands-on experience with the powerful features of Data Product Hub? Take a free trial today or talk to an IBM expert to learn how you can supercharge your data-driven outcomes.

0 comments
5 views

Permalink