Special thanks to @Aman Varma for the collaboration.
Entity resolution (ER) is a foundational element in modern master data management (MDM) practices, leveraging advanced machine learning techniques to accurately identify and link entities within an organization's data. Successful application of entity resolution can help organizations obtain clean, consistent, and accurate data, which provides tremendous business value, such as:
-
Consistent Customer Experiences: By achieving a comprehensive customer 360° view, organizations can deliver personalized experiences, fostering customer loyalty which ultimately drives revenue growth.
-
Efficient Supply Chain Management: ER enhances supply chain management by accurately linking suppliers, products, and transactions, leading to better inventory management, improved supplier relationships, and reduced error risks.
-
Regulatory Compliance: ER ensures regulatory compliance by accurately identifying and linking records under regulatory scrutiny, helping organizations avoid fines, reputational damage, and legal liabilities.
However, despite the ability to curate a golden 360° view of their entities, many companies struggle to provide the right data to the right people within the organization.
From Data as a Product, to Master Data as a Product
As enterprise data continues to grow exponentially, organizational silos grow. The prevalence of such data silos lead to inefficiencies and inconsistencies, as different teams and departments rely on different data sources in decision making.
Data as a product is a holistic mindset that view data as an integral part of an organization's value chain, encompassing everything from data production to consumption. It emphasizes the importance of data quality, consistency, and interoperability across various systems and processes, regardless of teams or department. This approach encourages the development of standardized data models, APIs, and data services that enable seamless data sharing and integration across systems. By adopting a data as a product mindset, organizations can break down data silos by treating data as interoperable and reusable assets.
A data marketplace is the critical tool for enabling a data as a product mindset, because it facilitates the discovery, procurement, and distribution of disparate data assets and other data products. By bringing together data producers and consumers in a single platform, a data marketplace helps organizations create a cohesive and comprehensive data strategy that spans all data sources and domains. With data marketplace, organizations are able to democratize their data, making data accessible and usable by anyone who needs it. This directly leads to not just greater data accuracy and reliability, but also greater collaboration, innovation, and agility within organizations. According to TDWI, 71% of enterprises use or plan to use cloud data marketplaces to sell data sets, pre-trained AI/ML models, and other data products (1).
Finally, building on top of the concept of data as a product, master data as a product specifically focuses on managing high-value, shared entities, such as customer, product, and supplier. Entity resolution techniques are employed to curate and enrich master data products, ensuring their accuracy, completeness, and consistency before they are distributed for consumption. In order for organizations to take advantage of master data as a product, entity resolution must become an integral step within the data production lifecycle, without introducing friction in the user experience.
The IBM Approach
In IBM Cloud Pak for Data (CP4D), components like IBM Match 360 and IBM Data Product Hub work together to empower organizations achieve master data as a product. In the example below, let’s demonstrate how the two components can integrate seamlessly.
Data Production
-
IBM Match 360 consolidates data from disparate sources to establish a single, trusted, 360-degree view of an organization’s entities, in this case customers, powered by a tunable and trainable intelligent matching algorithm. This data is highly valuable to all areas of business, so let’s package it as a product for consumption.
-
IBM Data Product Hub allows data producers to create, manage, and curate data products with key attributes such as business domain, access level, delivery methods, recommended usage, and data contract (terms and conditions). The out-of-the-box Match 360 connector enables data producers to package entities data natively as a data product on data product hub. The data contract feature ensures that data from Match 360 is shared with data consumers in a governed and transparent manner. By packaging entities from Match360 as data products, they become reusable, eliminating the need for data producers to repeatedly address similar data requests. This enhances efficiency for data producers and also accelerates access for high quality data from match360 to data consumers, as they no longer need to wait for days to obtain the data that they need.
Data Consumption
-
On IBM Data Product Hub, Data consumers can search and browse data products quickly by business domain, review data contracts, recommended usage and subscribe to products.
-
Finally, the data product is delivered to the data consumer using the "live access with Flight Service" method, which utilizes the Apache Arrow Flight protocol. This approach allows consumers to read data directly from Match360 without needing to copy it to a different location. By avoiding data movement, this method ensures both secure and faster access to Match360 entity assets. As of today this is the primary consumption method for Match 360 data products on data product hub, consumers can also conduct advanced analytics directly on CP4D, for example using a Jupyter notebook.
Learn more
The seamless integration of IBM Match 360 and IBM Data Product Hub supports the concept of master data as a product, enhancing data democratization and maximizing data value.
For more information, you can check out the following resources:
(1) Source: TDWI Research Q1’2023 (185 respondents)
#MasterDataManagement