Cloud Pak for Data

 View Only
Expand all | Collapse all

Questions for AMA: Data Fabric

Anonymous Member

Anonymous MemberThu February 17, 2022 12:05 PM

  • 1.  Questions for AMA: Data Fabric

    Posted Fri February 11, 2022 12:37 PM
    Edited by System Test Fri January 20, 2023 04:42 PM
    We'd like to answer your questions about Data Fabric. 
    We've arranged for experts from across IBM to answer your questions right here in this forum thread on on Feb 24 at 2pm Eastern/11am Pacific for a whole hour of AMA (Ask Me Anything).  Our topic is Data Fabric, so if you have questions, please start posting them as a response to this post

    Here are some ideas for topics:
    • Creating a catalog of data products and re-usable assets to accelerate analytics and data science projects
    • Enabling global data sharing and access while enforcing country-specific data and compliance policies
    • Augmenting the single view of the customer with AI- and ML-driven insight that enables smarter customer interactions
    • Enforcing fairness, quality, and explainability in models
    Our experts will hop on the Cloud Pak for Data Community discussion forum on Feb 24 at 2pm Eastern/11am Pacific and start answering your questions right here in this thread. 

    To learn more, or to get this AMA on your calendar, go to the AMA Data Fabric event page. This event will take place entirely in the discussion forum, so there is no meeting to join.  If you can't be online during the hour, don't worry; you can post your questions in advance and read the responses later.  

    Shannon Rouiller
    Content strategist, Cloud Pak for Data

  • 2.  RE: Questions for AMA: Data Fabric

    Posted Tue February 15, 2022 02:30 PM
    Hi Shannan

    Are Graph Database and Graph applications included in the IBM strategy for Cloud Pak for Data? 

    Kind regards

    Joe Dreyer

  • 3.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:08 PM
    Yes they are. Today we already make use of this technology in conjunction with things like single view use cases for example.

    Oliver Claude

  • 4.  RE: Questions for AMA: Data Fabric

    This message was posted by a user wishing to remain anonymous
    Posted Thu February 17, 2022 12:05 PM
    This post was removed

  • 5.  RE: Questions for AMA: Data Fabric

    This message was posted by a user wishing to remain anonymous
    Posted Fri February 18, 2022 12:22 PM
    This post was removed

  • 6.  RE: Questions for AMA: Data Fabric

    Posted Fri February 18, 2022 12:25 PM
    A data fabric is an architecture and set of data services that provide consistent capabilities across hybrid multicloud environments. It is a powerful architecture that standardizes data management, ETL, standardization, curating, and publishing data across cloud(s), on premises, and edge devices.ur approach to IBM

    Trustworthy AI is rooted in 3 pillars :
    • Trust in Data - Data Scientists require governed access to data that is of appropriate quality, relevance that can be scanned for bias, ina self serve manner, enabling them to accelerate analytics and ML lifecycles using our suite of collaborative analysis tools.
    • Trust in Models - Data science platform that not only provides automated MLOps, but infuses it with trust from model buildingthroughout its lifecycle to production
    • Trust in Process - AI Governance with automation at each stage of the AI Lifecycle via integrations with Factsheets and OpenPages

    John Chaves Chaves

  • 7.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:57 PM
    A data fabric is an architecture designed to understand, integrate, govern, and access data in a landscape where data will be distributed across multiple systems (e.g., applications, data warehouses, data lakes, etc.). The data fabric is a next generation way of "managing" data, by introducing the concept of a knowledge graph, enriched with semantics, and the notion of active metadata - which is a way to accelerate the building and operating of data pipelines, data virtualization, data engineering, data and AI governance, customer 360, etc., leveraging AI / ML. A data fabric is different from merely stitching together disparate tools in a point-to-point fashion, and requires capabilities to work in an integrated fashion.

    Oliver Claude

  • 8.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:58 PM
    Hello Kelley,

    see the answer on question 11) below.



    Martin Oberhofer

  • 9.  RE: Questions for AMA: Data Fabric

    Posted Fri February 18, 2022 06:14 AM
    Is the Data Fabric a platform? How can it help me use my data more effectively?

    Engr. Chinwe Vivian Ononiwu
    Assistant Manager ICT: Data Engineering, Strategic Intelligence Unit, ISDMA STOG
    Federal Inland Revenue Service
    Abuja Nigeria

  • 10.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:50 PM
    Edited by System Test Fri January 20, 2023 04:12 PM

    Data Fabric is an architectural approach (like Service Oriented Architecture), but you need some technology to implement the design.

    You can approach this task in two ways:

    1. Buy a bunch of commercial tools or get some Open Source tools and stitch them together (brittle and expensive, even with Open Source)
    2. Buy an integrated "solution" for the core capabilities of the data fabric (this is the IBM approach)

    We believe that the integration is a non-trivial effort and integrating parts built with the intent to be integrated in the first place makes the success of the effort more likely.

    Dejan Glozic

  • 11.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:55 PM
    Hello Chinwe,

    let me give you a couple examples. I spoke with a CDO of a major European bank and he told me that his data scientists are doing data science only doing data science on Thursday's and Friday's. When I inquired what the reason for this is the CDO mentioned that Monday to Wednesday they search for the right data assets (millions of them) and then they need to understand them regarding data quality, format, wrangle it, etc. On top of that, different departments build similar ML models which is redundant. And then there are regulations for fair use of AI in Europe where you need audit trails on who trained the ML model, which data asset it was trained on, etc.

    If you would have deployed a Data Fabric, these pain points would be gone because:
    1) You would have a self-service Data and AI Model Marketplace: Data Scientists can shop for what they need using semantic search finding what they need immediately. Any data asset found is curated with total visibility into data quality, data privacy needs, etc. A checkout allows them to provision the data asset into their analytics environment without IT team needs (and still with deep enforcement ensuring access, location, etc. policies being complied with automatically).
    2) While the data scientists train and deploy models - this e2e lifecycle is industrialized with Data / AI pipeline where all the required metadata for AI model factsheet is automatically collected and made available for every model satisfying all AI model audit needs.

    1) and 2) by itself saves a huge amount of labor hours with the automation built into the Data Fabric infrastructure.

    Kind regards,


    Martin Oberhofer

  • 12.  RE: Questions for AMA: Data Fabric

    Posted Sun February 20, 2022 02:40 PM
    Are data fabric and data mesh the same?

    Thank you.

    Jennifer Smith Gray

  • 13.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:33 PM
    Data fabric and data mesh are concepts that originated from different viewpoints, but they are complimentary.

    Data mesh is a concept for having distributed owners responsible for a domain and related data products and allowing people to subscribe and use these products. This concept is based on the principle of not having to consolidate data into a central place in order to perform data engineering and governance.

    Data fabric is a concept for having an abstraction layer that cuts across data silos / data platforms, and provides a set of capabilities to understand, govern, integrate, and access data.

    Data mesh and data fabric are complimentary.

    Oliver Claude

  • 14.  RE: Questions for AMA: Data Fabric

    Posted Mon February 21, 2022 04:27 AM
    Hi experts, 

    Does data fabric replace data lakes? 
    And why would you use data fabric instead of data lakes?


    Polya Markova

  • 15.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:28 PM
    Hi Polya,  a Data Lake often means moving data, e.g. from transactional systems, to a Data Lake store.  This centralized approach often led to duplicate data and sometimes one ended up with multiple data lakes or data puddles.  How do you create an inventory of all of them or query across the data stores?  This is where a Data Fabric can help.  Data Fabrics give you an ability to create an inventory of all your data and query it with security and governance - avoiding the move of data or extra copies of data.

    David Lebutsch

  • 16.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:41 PM
    Data fabric doesn't replace a data lake. In fact, a data lake can be a "node" in the data fabric.

    In a data fabric, you can have data lakes, data warehouses, etc. and they can be on prem or in the cloud.

    The data fabric is a layer of abstraction that sits on top of all of these.

    Oliver Claude

  • 17.  RE: Questions for AMA: Data Fabric

    Posted Mon February 21, 2022 02:58 PM
    I understand Data Fabric is a concept. What are the key technology components that form Data Fabric? Could you share some use cases and architectures for Data Fabric?

    Hema Jagadeeshan

  • 18.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:19 PM
    A typical use case is accessing or querying data across multiple distributed data sources, with security and governance, without moving the data.  It starts with a data catalog that has governance policies defined.  For example what data needs to be masked, anonymized, where the data is located and who can do what with the data.  The Data Fabric understands those policies and data locations and enforce them close to the location before the data leaves the data planes.

    David Lebutsch

  • 19.  RE: Questions for AMA: Data Fabric

    Posted Tue February 22, 2022 01:00 PM

    What's different about a data fabric vs. traditional approaches to integrate, govern, and access data?


    [Trish] [Smith] [MBA, BMath, Mom]
    [Content Developer]
    [Ottawa] [ON]

  • 20.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:45 PM
    Hello Trish,

    this is an excellent question. The key differences for Data Fabric compared to traditional approaches are:
    1) Self-service Data and AI Model Marketplace: Business users are empowered to find and work with data across the enterprise. The ability to move data, etc. needs to be simplified to the point that business users can do it with a couple of mouse-clicks without a dependency on the IT department.
    2) To allow 1) critical roles such as Chief Information Security Officers (CISO), Chief Data Officers (CDOs), Chief Privacy Officers (CPO), etc. need to be able to still sleep well at night. This is only possible if the Data Fabric solution automatically enforces end-to-end aspects of access, privacy, data placement, retention, etc.
    3) A Data Fabric learns and advises the users pro-actively: If you make a change in a data model, an intelligent Data Fabric provides the user immediate feedback on downstream applications, e.g. ETL jobs which would be broken if you make the model change. Or if you are a Data Steward resolving data quality issues, once you did this a certain number of times for similar task, a machine learning model in the background should be able to discover the pattern of resolution and take care based on predicting the action on future, similar tasks automatically.

    These are just a couple of examples of some of the fundamental ideas - there are many more. But I hope this gives you a first idea that a Data Fabric is quite different from old approaches of data management.

    Kind regards,


    Martin Oberhofer

  • 21.  RE: Questions for AMA: Data Fabric

    Posted Wed February 23, 2022 12:35 PM
    Hi Shannon:

    I know the steps to implement a traditional Data Warehouse project, but I would like to know what are the steps to implement a DataFabric project? do we have a template to follow? Right now we are implementing a Panning Analytics solution with CP4D but at the same time we are implementing a Data Fabric solution, but I'm not clear how this 2 things can be plan as one integrated solution.

    Victor Jimenez

    DataStage Consultant

  • 22.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:36 PM
    Hi Victor,

    a Data Fabric is deployed incrementally where the capabilities deployed are based on use case priorities. So in your example, you seem to  have deployed DataStage and a data warehouse solution. You can expand from there by bringing Watson Knowledge Catalog into the mix capturing the metadata of your DWH and the metadata of your Data Stage jobs. Once thats done, you get the following additional value points:
    1) Ability to see the impact of model changes on sources and / or the data warehouse using lineage
    2) Ability to see the data quality from your sources - if they degrade too much, you might want to stop that source shipping data to your data warehouse until the data quality in the source is fixed
    3) Ability to protect sensitive that in that ecosystem

    Lets assume your next business pain point is redundant and inconsistent master data preventing the reports in your data warehouse to be meaningful (e.g. revenue by product category by customer segment, etc.). In this case based on your previously deployed Watson Knowledge Catalog, you can identify which sources provide master data records.
    With that insight you would deploy the IBM Match 360 with Watson capability which allows you to deduplicate the master data across the various sources and you would feed with Data Stage re-using the previously deployed capability here again to feed the customer 360 entity data from Match 360 into your warehouse improving your analytical insights.

    The bottomline is: There is no one silver bullet roadmap in which order you deploy Data Fabric capabilities. You select the capabilities based on business priorities and roll-out capabilities as needed / re-use capabilities from previous projects. 

    Kind regards,


    Martin Oberhofer

  • 23.  RE: Questions for AMA: Data Fabric

    Posted Wed February 23, 2022 02:00 PM
    Can a data fabric be used with data in motion / realtime streaming data? Many demos seem to focus on extracts or databases. For example, collecting sensor data from devices at a hospital.

    Can a data fabric (CP4D) be an easy way for clients to consolidate data from internal and external sources with minimal programming/development? For example, I want to call a REST API hourly to get the current weather an store that data somewhere.

    Vincent Tran

  • 24.  RE: Questions for AMA: Data Fabric

    Posted Fri February 25, 2022 12:09 PM
    Yes a data fabric architecture would support what you are describing, whether it is data in motion, or consolidating internal / external data. A data fabric should have a range of capabilities to support any style of integration (ETL, event-driven, real-time data, data virtualization, etc.). As far as "minimal" programming is concerned, it's hard to say what would be "minimal" even though APIs are available - this would require a deeper discussion.

    Oliver Claude

  • 25.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 01:08 AM
    Does Graph Technologies play a role in the Data Fabric offering?

    Joe Dreyer

  • 26.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:04 PM
    Graph technologies can be and are used for semantic search, lineage, C360 and other use cases for Data Fabric.

    Martin Oberhofer

  • 27.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 11:17 AM
    Edited by System Test Fri January 20, 2023 04:40 PM
    Some of the capabilities (services) in the Data Fabric solutions are not available in all regions. What is the effect of regional availability of services on a Data Fabric deployment?

    Barbara Schramm
    Content Designer Cloud Pak for Data as a Service

  • 28.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:13 PM
    Data gravity is important to consider in any data-centric discussion (including Data Fabrics).  So availability of services in a region should be a key consideration to any solution.

    Trent Gray Donald
    Distinguished Engineer
    Ottawa ON

  • 29.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 12:03 PM
    Are there any opensource data fabrics or opensource stacks that provide data fabric functionality?

    Vincent Tran

  • 30.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:05 PM
    Hello Vincent,

    There are many open source components which can be used for Data Fabric use cases, e.g. lucene technology as an element in semantic search. Or ontology reasoning libraries for query expansion in semantic search. And these are just a few point examples, there are many more.

    Kind regards,


    Martin Oberhofer

  • 31.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:12 PM
    IBM Data Fabric leverages many opensource projects such as or .  As with all opensource integration and assembly is required to fulfil a purpose.

    David Lebutsch

  • 32.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 12:29 PM
    Does CP4D provide users the ability to label their data - and for administrators to manage users specifically brought in to label datasets?

     Does CP4D manage row level security where you want to limit access to only certain rows of a dataset (based on conditions/rules)?

    Does CP4D provide a way to view audit history. Who did what with what data? Who saw what?

    Vincent Tran

  • 33.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:07 PM
    Hello Vincent,

    related to the first question:
    Yes, with the enterprise data governance catalog users can label data assets in different ways, e.g. via tagging or via business term assignment (either manual or with ML automation) to mention just 2 examples. 

    related to your second question:
    Yes, the policy enforcement engine is being enhanced for row level enforcement.

    related to your third question:
    There are some audit trails available, e.g. with the AI Factsheet you can see who trained the data model and on which asset it was trained on, etc. If you execute bulk operations (e.g. bulk data movement with DataStage) you can see the job execution time, etc. as well. 

    Kind regards,


    Martin Oberhofer

  • 34.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 01:44 PM
    Hi CP4D team, a number of key questions on Data Fabric:

    1. What is the implementation service required behind setting Data Fabric for an organization? I can't fathom this to be a plug-and-play solution.

    2. If we leverage data virtualization in Data Fabric, wouldn't we be pushing OLAP analytics computing burden back onto the data source systems, especially the ones that are transactional OTLP systems? What do we say about this issue regarding to Data Fabric, or the data virtualization aspect behind it?

    3. There are a lot of "automations" mentioned in the marketing material. Can we hear something a bit deeper about how data governance automation is achieved?

    Tian Cai

  • 35.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:16 PM
    Hello Tian,

    related to your first question:
    Data Fabric is an enterprise information architecture approach to manage data assets at scale with governance and lifecycle support. The business value is democratization of data with self-service features, industrialization features for AI for the e2e lifecycle of AI models, etc. 
    On the technical side, this requires an integrated set of capabilities and depending on use case you would deploy a first set of capabilities you need and grow from there. For example, you can start with Watson Knowledge Catalog and Watson Studio for an industrialized, trustworthy AI use case.

    related to your second question:
    Data virtualization can be used for exploration where a business analyst or Data Scientist can get a first impression of insights which can be driven across data sources. If in-depth analytics on very large operational data system with high concurrency throughput are needed, then to avoid contention with the operational workload requirements, you might need to move the data to a lake house for in-depth analytics. However, where its possible to use data virtualization, cost of redundant data copies can be avoided.

    related to your third question:
    One example is metadata import at scale using the metadata bulk import capabilities allowing you to ingest large volumes of metadata with a few mouse clicks reducing manual labor hours significantly.
    One example is machine-learning infused term assignment which can be used to assign business terms across very large amount of data assets. Based on confidence thresholds you can decide when the term is auto-assigned vs. when you want a human review and approval step.
    Another example is the automated creation of a AI Factsheets which shows you how trained the model, on which data was it trained, key quality metrics of the model, where it is deployed, etc. This metadata is captured automatically across the entire AI model lifecycle.
    And this are just a few examples.

    Kind regards,


    Martin Oberhofer

  • 36.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:32 PM
    Hi Tian, adding to what Martin said on your second question:  Data Virtualization does quite a bit of 'smart' caching which can reduce the load on transactional systems.  If caching doesn't yield the outcomes required replication or change data capture techniques are frequently used to create a nearline copy of the data for analytical queries.

    David Lebutsch

  • 37.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 01:44 PM
    Can I assume that IBM's CloudPak for Data - Data Fabric tools mainly consists of two functionalities:
    1. Data virtualization technology (powered by, well, data virtualization) - for integrating disparate data silos;
    2. Data cataloguing technology (powered by Knowledge Studio) - for looking up data sources?

    Tian Cai

  • 38.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:23 PM
    Hello Tian,

    that assumption is wrong. IBM Data Fabric on Cloud Pak for Data has many more capabilities. e.g.
    • IBM Match 360 with Watson: This capability allows you to seamless determine a C360 on your customer data which can be consumed for downstream analytics like next best offer, next best offer, social influencer scoring, customer churn prediction and many more.
    • IBM Data Stage & Replication: This is a Hybrid Multi-Cloud data integration and movement capability providing you batch, replication and real-time data movement & transformation capabilities.
    • The data ingestion & persistency layer for Data Fabric has many different types of options available reaching from Cloud Object Storage, relational databases, NoSQL databases, open source databases, etc.
    • Watson Studio / Watson ML /  Watson OpenScale / Watson Discovery: This a family of AI related capabilities allowing Data Scientists to build solutions for structured and unstructured data optimizing business processes with the infusion of AI and managing the model life cycle end to end.
    • Watson Assistant: This is a customer care solution allowing the seamless and quick deployment of chatbot solutions.

    And this is just the tip of the iceberg. There are many more.

    Kind regards,


    Martin Oberhofer

  • 39.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 01:49 PM
    What AI is embedded in the CP4D Data Fabric system? For example would it automatically detect personal info like email address, social security number, and credit card numbers and mask them by default?

    What would be some other AI capabilities?

    Tian Cai

  • 40.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:26 PM
    Hello Tian,

    the CP4D Data Fabric solution has capabilities to detect PII / PHI / SPI information and classify data assets accordingly. These are functions available through the Watson Knowledge Catalog. In addition, there is a rich set of masking functions available (e.g. format preserving encryption, etc.) to protect the values in such fields if required (e.g. the user trying to access it not having the privilege to do so). Furthermore, whenever there is a data asset flagged as containing PII with Watson Open Pages, the Chief Privacy Officer has an automated solution to produce the privacy compliance reports needed to reporting to regulatory authorities enforcing privacy law compliance requirements like GDPR, CCPA, etc.

    Kind regards,


    Martin Oberhofer

  • 41.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:57 PM
    Edited by System Test Fri January 20, 2023 04:25 PM
    I am new to Data Fabric. Interested in getting involved and contributing. Looking for means to do so.

    Reading a bit about this topic, it seems to be heavily dependent on governance among other data integration and processing technologies.
    With the WKC, connection platform, Data Virtualization and Refinery, etc... currently offered in CP4D, what services are missing to implement a Data Fabric solution for our customers? or are all required services are there and only need to package them properly?


    Hisham Ghanem
    Vienna VA

  • 42.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 02:58 PM
    Hi Hisham,  it is about the integration and the packaging but also additional capabilities to enforce governed access to data in or near the data source. offers a good opensource architectural illustration,  look at the Data Access Modules.

    David Lebutsch

  • 43.  RE: Questions for AMA: Data Fabric

    Posted Thu February 24, 2022 04:03 PM
    Dear Panel,

    At the conceptual level, there is a discussion about data centric vs data driven approaches.

    Data centric refers to "an architecture where data is the primary and permanent asset, and applications come and go. In the data centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone."

    Data driven is about "building tools, abilities, and, most crucially, a culture that acts on data."

    Does Data Fabric lead to a data centric vision and if yes, how does it do it?



  • 44.  RE: Questions for AMA: Data Fabric

    Posted Mon February 28, 2022 10:04 PM

    I'm not sure if you can completely decouple data from "applications" in the sense that data would have to have a context that gives it meaning. Now the question is whether the context maintains a level of independence if the actual application (e.g., CRM application, Analytics dashboard, etc.) "comes and goes." That's a tough question to answer because the data was created by the application in a particular context which isn't not fully encapsulated in the metadata. Even if it could be, it would represent a point-in-time state, and that state could only be changed by external business logic or human engineering.

    That being said, data could be an asset in the sense that a data set could be created and assigned an "identity" and start a life outside the application that created it so to speak. For example, 3rd-party data like D&B, a curated customer list, etc. could be data "products" that have inherent value to be re-used in new contexts in conjunction with new "applications."

    So from that perspective, the data fabric needs to support both data-centric and data-driven concepts. For example, the data fabric enables data-centricity with the data catalog component which provides an inventory of data assets, but it also enables a data-driven culture through the use of these data assets in the "applications" that derive value from the data.

    Oliver Claude

  • 45.  RE: Questions for AMA: Data Fabric

    Posted Mon February 28, 2022 10:47 PM
    Thank you Olivier. As data is created from a particular context by application, the level of application independence in a data architecture could be to do with a common industry data model, or as you suggested,  data products that have inherent value to be re-used in new contexts in conjunction with new "applications." In data fabric sense, augmented knowledge, a unified view of metadata and master data, seems to be the place for this.
