Data Governance - Knowledge Catalog

Data Governance - Knowledge Catalog

Get advice from their industry peers, communicate with data governance and quality experts on best practices, and stay up to date on product news and helpful materials.

 View Only

Strengthen your data foundation with reusable data products using Data Product Hub and IBM watsonx.data

By Aman Varma posted 26 days ago

  

In a world where data is everywhere, the real challenge isn’t collecting it – it’s making it useful. Organizations today need more than just access to data. They need high-quality, reusable, and trusted data products that are easy to discover, govern, and consume. That’s exactly what IBM delivers through Data Product Hub, included as a key component of IBM watsonx.data intelligence. watsonx.data lakehouse and Data Product Hub work together to transform fragmented data assets into strategic products that power analytics, AI/ML use cases, and business decisions at scale. IBM watsonx.data lakehouse and Data Product Hub work together to transform fragmented data assets into strategic products that power analytics, AI/ML use cases, and business decisions at scale. 

The backbone: IBM Watsonx.data with Medallion Architecture 

While Data Product Hub is where data products live and thrive, IBM Watsonx.data lakehouse is where they’re born and refined. Built on an hybrid, open Lakehouse architecture, IBM Watsonx.data enables you to store and query data across multiple formats and environments – without the complexity of moving it. The offering streamlines the full lifecycle of data for AI, bringing together integration, governance, and management across all data types and environments – from on-premises to hybrid and multi-cloud. An organization can implement medallion architecture, a layered approach that structures data in three tiers: 

  • Bronze zone: Raw, ingested data – great for exploration and data science. 

  • Silver zone: Cleaned and enriched data – ready for analytics and operational reporting. 

  • Gold zone: Business-ready, trusted data – ideal for strategic decision-making. 

This architecture helps organizations incrementally improve data quality, so the further up the stack you go, the more valuable and reusable the data becomes. 

Turning medallion zones assets into data products 

Now here’s where things get exciting. Each layer of the medallion architecture feeds directly into Data Product Hub – giving users a central place to find and use data at the level of quality they need. So, a data producer can package assets from any of the zones and turn them into a data product. A data consumer can view the zone to which a data product belongs using custom metadata definition capability, along with its business and technical metadata. With this visibility into the data product’s maturity, the consumer can make an informed decision about whether to subscribe to it. 

  • A data engineer might grab a bronze zone data product for raw experimentation. 

  • An analyst could rely on a silver zone data product for clean, consistent reporting. 

  • A business leader can trust a gold zone data product for making informed business decisions. 

And because data products created in Data Product Hub come along with rich metadata like key features, data contract, recommended usage and lineage information, users can understand exactly what they’re using – and trust it. The data contract attached to the data product allows data producers and data consumers to get visibility into the terms of use and service level agreement associated with data product. This enables transparent sharing of data product between the two personas. In addition to this, a data producer can deliver the data product to a data consumer through customizable delivery methods. I have described below the different delivery methods using which a data product can be delivered. 

Delivery methods 

  • Direct access to watsonx.data asset: This delivery method allows data consumer to get direct read access of the assets packaged as a data product in Watsonx.data lakehouse. When using this delivery method, data doesn’t move, which leads to enhanced security during data product delivery. The data product can be accessed through Presto engine connectors, based on the data consumer’s use case. In our upcoming release we will be expanding the support for engine that a data consumer can leverage to access the data product. Data consumers who want to perform BI analysis using tools like power BI, Tableau etc. can leverage this delivery method to connect the data product to the tool of their choice using JDBC/ODBC. 

    • Fit for purpose engine: The flexibility to choose the preferred engine leads to efficient usage of Watsonx.data lakehouse as data consumer can use the engine that fits their use case. 

      • Watsonx.data Presto engine: A data consumer can use the Presto engine to directly access the data product they want to analyze. 

  • Delivery of data product in consumer owned Watsonx.data Lakehouse: Using this delivery method, a data consumer can get the data product delivered as a table in their own lakehouse. This delivery method enable data consumer to pull a snapshot of data that is hosted outside of their lakehouse. This delivery method enhances the implementation of medallion architecture as the consumer can select the zone for data product delivery. 

Now that we have understood how Data Product Hub and Watsonx.data Lakehouse work with each other from data product creation to data product delivery, let’s understand reusable data product and the benefits it brings to the customer in the next section. 

Why reusable data products are a game changer:

High-quality, reusable data products solve some of the biggest data challenges organizations face today:

  • Trusted by default:  

    • Data products have data contracts that define the set of rules and policies on how data product should be shared on the platform. 

  • Built once, used often 

    • Teams can reuse data products across departments and domains, reducing duplication and boosting consistency. Data product follows a lifecycle, allowing users to create a new version whenever its underlying data assets change 

  • Easy to find 

    • Thanks to powerful discovery, semantic search capability and metadata tagging, users can find what they need quickly. 

  • Monitoring capability 

    • Data product usage can be monitored using the Data Product Hub monitoring dashboard which allows data producers to calculate ROI of the data products and manage the lakehouse more efficiently as the monitoring dashboard provides visibility into the most used data products. 

Together, Data Product Hub and Watsonx.data make the Watsonx.data lakehouse self-serve and accessible to both business and technical users – while enabling organizations to manage usage efficiently and giving data consumers the flexibility to access data products via either Presto or Spark engine. 

Conclusion 

When you combine the medallion architecture of Watsonx.data lakehouse with the curated, business-ready interface of Data Product Hub, you unlock a new level of data usability across the enterprise. This approach transforms how organizations: 

  • Deliver insights 

  • Build AI models 

  • Collaborate across teams 

  • And most importantly – trust their data 

In short – less wrangling, more doing. If you’re on a journey to modernize your data strategy, it might be time to stop thinking in tables – and start thinking in data products. To learn more about Data Product Hub or Watsonx.data, access our on-demand webinar here. 

0 comments
6 views

Permalink