Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

View Only

Back to discussions

Expand all | Collapse all

Questions for AMA: Data Fabric

Martin OberhoferThu February 24, 2022 02:16 PM

David LebutschThu February 24, 2022 02:32 PM

Hi Tian, adding to what Martin said on your second question: Data Virtualization does quite a bit ...

Tian CaiThu February 24, 2022 01:44 PM

Can I assume that IBM's CloudPak for Data - Data Fabric tools mainly consists of two functionalities: ...

Martin OberhoferThu February 24, 2022 02:23 PM

Hello Tian, that assumption is wrong. IBM Data Fabric on Cloud Pak for Data has many more capabilities. ...

Tian CaiThu February 24, 2022 01:49 PM

What AI is embedded in the CP4D Data Fabric system? For example would it automatically detect personal ...

Martin OberhoferThu February 24, 2022 02:26 PM

Hello Tian, the CP4D Data Fabric solution has capabilities to detect PII / PHI / SPI information and ...

Hisham GhanemThu February 24, 2022 02:57 PM

I am new to Data Fabric. Interested in getting involved and contributing. Looking for means to do so. ...

David LebutschThu February 24, 2022 02:58 PM

Hi Hisham, it is about the integration and the packaging but also additional capabilities to enforce ...

KEITH DOANThu February 24, 2022 04:03 PM

Dear Panel, At the conceptual level, there is a discussion about data centric vs data driven approaches. ...

Oliver ClaudeMon February 28, 2022 10:04 PM

I'm not sure if you can completely decouple data from "applications" in the sense that data would have ...

KEITH DOANMon February 28, 2022 10:47 PM

Thank you Olivier. As data is created from a particular context by application, the level of application ...

1. Questions for AMA: Data Fabric

Like
Shannon Rouiller
Posted Fri February 11, 2022 12:37 PM
Edited by System Admin Fri January 20, 2023 04:42 PM

Reply
We'd like to answer your questions about Data Fabric.

We've arranged for experts from across IBM to answer your questions right here in this forum thread on on Feb 24 at 2pm Eastern/11am Pacific for a whole hour of AMA (Ask Me Anything). Our topic is Data Fabric, so if you have questions, please start posting them as a response to this post.

Here are some ideas for topics:

Creating a catalog of data products and re-usable assets to accelerate analytics and data science projects

Enabling global data sharing and access while enforcing country-specific data and compliance policies

Augmenting the single view of the customer with AI- and ML-driven insight that enables smarter customer interactions

Enforcing fairness, quality, and explainability in models

Our experts will hop on the Cloud Pak for Data Community discussion forum on Feb 24 at 2pm Eastern/11am Pacific and start answering your questions right here in this thread.

To learn more, or to get this AMA on your calendar, go to the AMA Data Fabric event page. This event will take place entirely in the discussion forum, so there is no meeting to join. If you can't be online during the hour, don't worry; you can post your questions in advance and read the responses later.

------------------------------
Shannon Rouiller
Content strategist, Cloud Pak for Data
------------------------------
#CloudPakforDataGroup
2. RE: Questions for AMA: Data Fabric

Like
Joe Dreyer
Posted Tue February 15, 2022 02:30 PM

Reply
Hi Shannan

Are Graph Database and Graph applications included in the IBM strategy for Cloud Pak for Data?

Kind regards
Joe

------------------------------
Joe Dreyer
------------------------------

Original Message
3. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Thu February 24, 2022 02:08 PM

Reply
Yes they are. Today we already make use of this technology in conjunction with things like single view use cases for example.

------------------------------
Oliver Claude
------------------------------

Original Message
4. RE: Questions for AMA: Data Fabric

Like
This message was posted by a user wishing to remain anonymous
Posted Thu February 17, 2022 12:05 PM

Reply
This post was removed
#CloudPakforDataGroup
5. RE: Questions for AMA: Data Fabric

Like
This message was posted by a user wishing to remain anonymous
Posted Fri February 18, 2022 12:22 PM

Reply
This post was removed
#CloudPakforDataGroup
6. RE: Questions for AMA: Data Fabric

Like
John Chaves Chaves
Posted Fri February 18, 2022 12:25 PM

Reply
A data fabric is an architecture and set of data services that provide consistent capabilities across hybrid multicloud environments. It is a powerful architecture that standardizes data management, ETL, standardization, curating, and publishing data across cloud(s), on premises, and edge devices.ur approach to IBM

Trustworthy AI is rooted in 3 pillars :
• Trust in Data - Data Scientists require governed access to data that is of appropriate quality, relevance that can be scanned for bias, ina self serve manner, enabling them to accelerate analytics and ML lifecycles using our suite of collaborative analysis tools.
• Trust in Models - Data science platform that not only provides automated MLOps, but infuses it with trust from model buildingthroughout its lifecycle to production
• Trust in Process - AI Governance with automation at each stage of the AI Lifecycle via integrations with Factsheets and OpenPages

------------------------------
John Chaves Chaves
------------------------------

Original Message
7. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Thu February 24, 2022 02:57 PM

Reply
A data fabric is an architecture designed to understand, integrate, govern, and access data in a landscape where data will be distributed across multiple systems (e.g., applications, data warehouses, data lakes, etc.). The data fabric is a next generation way of "managing" data, by introducing the concept of a knowledge graph, enriched with semantics, and the notion of active metadata - which is a way to accelerate the building and operating of data pipelines, data virtualization, data engineering, data and AI governance, customer 360, etc., leveraging AI / ML. A data fabric is different from merely stitching together disparate tools in a point-to-point fashion, and requires capabilities to work in an integrated fashion.

------------------------------
Oliver Claude
------------------------------

Original Message
8. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:58 PM

Reply
Hello Kelley,

see the answer on question 11) below.

Thanks.

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
9. RE: Questions for AMA: Data Fabric

Like
CHINWE VIVIAN ONONIWU
Posted Fri February 18, 2022 06:14 AM

Reply
Is the Data Fabric a platform? How can it help me use my data more effectively?

------------------------------
Engr. Chinwe Vivian Ononiwu
Assistant Manager ICT: Data Engineering, Strategic Intelligence Unit, ISDMA STOG
Federal Inland Revenue Service
Abuja Nigeria
------------------------------

Original Message
10. RE: Questions for AMA: Data Fabric

Like
Dejan Glozic
Posted Thu February 24, 2022 02:50 PM
Edited by System Admin Fri January 20, 2023 04:12 PM

Reply
Data Fabric is an architectural approach (like Service Oriented Architecture), but you need some technology to implement the design.

You can approach this task in two ways:

Buy a bunch of commercial tools or get some Open Source tools and stitch them together (brittle and expensive, even with Open Source)

Buy an integrated "solution" for the core capabilities of the data fabric (this is the IBM approach)

We believe that the integration is a non-trivial effort and integrating parts built with the intent to be integrated in the first place makes the success of the effort more likely.

------------------------------
Dejan Glozic
------------------------------

Original Message
11. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:55 PM

Reply
Hello Chinwe,

let me give you a couple examples. I spoke with a CDO of a major European bank and he told me that his data scientists are doing data science only doing data science on Thursday's and Friday's. When I inquired what the reason for this is the CDO mentioned that Monday to Wednesday they search for the right data assets (millions of them) and then they need to understand them regarding data quality, format, wrangle it, etc. On top of that, different departments build similar ML models which is redundant. And then there are regulations for fair use of AI in Europe where you need audit trails on who trained the ML model, which data asset it was trained on, etc.

If you would have deployed a Data Fabric, these pain points would be gone because:
1) You would have a self-service Data and AI Model Marketplace: Data Scientists can shop for what they need using semantic search finding what they need immediately. Any data asset found is curated with total visibility into data quality, data privacy needs, etc. A checkout allows them to provision the data asset into their analytics environment without IT team needs (and still with deep enforcement ensuring access, location, etc. policies being complied with automatically).
2) While the data scientists train and deploy models - this e2e lifecycle is industrialized with Data / AI pipeline where all the required metadata for AI model factsheet is automatically collected and made available for every model satisfying all AI model audit needs.

1) and 2) by itself saves a huge amount of labor hours with the automation built into the Data Fabric infrastructure.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
12. RE: Questions for AMA: Data Fabric

Like
Jennifer Smith Gray
Posted Sun February 20, 2022 02:40 PM

Reply
Are data fabric and data mesh the same?

Thank you.

------------------------------
Jennifer Smith Gray
IBM
Toronto
------------------------------

Original Message
13. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Thu February 24, 2022 02:33 PM

Reply
Data fabric and data mesh are concepts that originated from different viewpoints, but they are complimentary.

Data mesh is a concept for having distributed owners responsible for a domain and related data products and allowing people to subscribe and use these products. This concept is based on the principle of not having to consolidate data into a central place in order to perform data engineering and governance.

Data fabric is a concept for having an abstraction layer that cuts across data silos / data platforms, and provides a set of capabilities to understand, govern, integrate, and access data.

Data mesh and data fabric are complimentary.

Data mesh requires the capabilities of a data fabric, but it is driven by a specific set of principles. The capabilities of a data fabric can be used to implement a data mesh, but a data fabric is driven by a broader set of principles.

Data mesh:

focused on having domain owners and data product owners

assumes data ownership will be distributed to these owners vs. trying to consolidate and manage the data in a data lake

assumes more of a pub / sub model where data consumers can find data products in a data mesh "catalog" and subscribe to the data products

Data fabric:

allows BOTH distributed data domain / data product owners AND centralized ownership and governance, but it doesn't force you to do one or the other

allows BOTH data to be decentralized and / or centralized

allows for a range of consumption models and assets ("finished" data product, "raw" data sets, features, etc.) and mechanisms (pub / sub, data virtualization, ETL / ELT, streams, etc.)

From an IBM perspective, even though we call our solution "data fabric" it can be used to implement a data mesh concept and a data fabric concept.

------------------------------
Oliver Claude
------------------------------

Original Message
14. RE: Questions for AMA: Data Fabric

Like
Polya Markova
Posted Mon February 21, 2022 04:27 AM

Reply
Hi experts,

Does data fabric replace data lakes?
And why would you use data fabric instead of data lakes?

Thanks!

------------------------------
Polya Markova
------------------------------

Original Message
15. RE: Questions for AMA: Data Fabric

Like
David Lebutsch
Posted Thu February 24, 2022 02:28 PM

Reply
Hi Polya, a Data Lake often means moving data, e.g. from transactional systems, to a Data Lake store. This centralized approach often led to duplicate data and sometimes one ended up with multiple data lakes or data puddles. How do you create an inventory of all of them or query across the data stores? This is where a Data Fabric can help. Data Fabrics give you an ability to create an inventory of all your data and query it with security and governance - avoiding the move of data or extra copies of data.

------------------------------
David Lebutsch
------------------------------

Original Message
16. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Thu February 24, 2022 02:41 PM

Reply
Data fabric doesn't replace a data lake. In fact, a data lake can be a "node" in the data fabric.

In a data fabric, you can have data lakes, data warehouses, etc. and they can be on prem or in the cloud.

The data fabric is a layer of abstraction that sits on top of all of these.

------------------------------
Oliver Claude
------------------------------

Original Message
17. RE: Questions for AMA: Data Fabric

Like
Hema Jagadeeshan
Posted Mon February 21, 2022 02:58 PM

Reply
I understand Data Fabric is a concept. What are the key technology components that form Data Fabric? Could you share some use cases and architectures for Data Fabric?

------------------------------
Hema Jagadeeshan
------------------------------

Original Message
18. RE: Questions for AMA: Data Fabric

Like
David Lebutsch
Posted Thu February 24, 2022 02:19 PM

Reply
A typical use case is accessing or querying data across multiple distributed data sources, with security and governance, without moving the data. It starts with a data catalog that has governance policies defined. For example what data needs to be masked, anonymized, where the data is located and who can do what with the data. The Data Fabric understands those policies and data locations and enforce them close to the location before the data leaves the data planes.

------------------------------
David Lebutsch
------------------------------

Original Message
19. RE: Questions for AMA: Data Fabric

Like
Trish Smith
Posted Tue February 22, 2022 01:00 PM

Reply
What's different about a data fabric vs. traditional approaches to integrate, govern, and access data?

Thanks,
Trish

------------------------------
[Trish] [Smith] [MBA, BMath, Mom]
[Content Developer]
[IBM]
[Ottawa] [ON]
[613-356-5435]
------------------------------

Original Message
20. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:45 PM

Reply
Hello Trish,

this is an excellent question. The key differences for Data Fabric compared to traditional approaches are:
1) Self-service Data and AI Model Marketplace: Business users are empowered to find and work with data across the enterprise. The ability to move data, etc. needs to be simplified to the point that business users can do it with a couple of mouse-clicks without a dependency on the IT department.
2) To allow 1) critical roles such as Chief Information Security Officers (CISO), Chief Data Officers (CDOs), Chief Privacy Officers (CPO), etc. need to be able to still sleep well at night. This is only possible if the Data Fabric solution automatically enforces end-to-end aspects of access, privacy, data placement, retention, etc.
3) A Data Fabric learns and advises the users pro-actively: If you make a change in a data model, an intelligent Data Fabric provides the user immediate feedback on downstream applications, e.g. ETL jobs which would be broken if you make the model change. Or if you are a Data Steward resolving data quality issues, once you did this a certain number of times for similar task, a machine learning model in the background should be able to discover the pattern of resolution and take care based on predicting the action on future, similar tasks automatically.

These are just a couple of examples of some of the fundamental ideas - there are many more. But I hope this gives you a first idea that a Data Fabric is quite different from old approaches of data management.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
21. RE: Questions for AMA: Data Fabric

Like
VICTOR JIMENEZ SANCHEZ
Posted Wed February 23, 2022 12:35 PM

Reply
Hi Shannon:

I know the steps to implement a traditional Data Warehouse project, but I would like to know what are the steps to implement a DataFabric project? do we have a template to follow? Right now we are implementing a Panning Analytics solution with CP4D but at the same time we are implementing a Data Fabric solution, but I'm not clear how this 2 things can be plan as one integrated solution.

Victor Jimenez

------------------------------
VICTOR JIMENEZ SANCHEZ
DataStage Consultant
IBM
------------------------------

Original Message
22. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:36 PM

Reply
Hi Victor,

a Data Fabric is deployed incrementally where the capabilities deployed are based on use case priorities. So in your example, you seem to have deployed DataStage and a data warehouse solution. You can expand from there by bringing Watson Knowledge Catalog into the mix capturing the metadata of your DWH and the metadata of your Data Stage jobs. Once thats done, you get the following additional value points:
1) Ability to see the impact of model changes on sources and / or the data warehouse using lineage
2) Ability to see the data quality from your sources - if they degrade too much, you might want to stop that source shipping data to your data warehouse until the data quality in the source is fixed
3) Ability to protect sensitive that in that ecosystem

Lets assume your next business pain point is redundant and inconsistent master data preventing the reports in your data warehouse to be meaningful (e.g. revenue by product category by customer segment, etc.). In this case based on your previously deployed Watson Knowledge Catalog, you can identify which sources provide master data records.
With that insight you would deploy the IBM Match 360 with Watson capability which allows you to deduplicate the master data across the various sources and you would feed with Data Stage re-using the previously deployed capability here again to feed the customer 360 entity data from Match 360 into your warehouse improving your analytical insights.

The bottomline is: There is no one silver bullet roadmap in which order you deploy Data Fabric capabilities. You select the capabilities based on business priorities and roll-out capabilities as needed / re-use capabilities from previous projects.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
23. RE: Questions for AMA: Data Fabric

Like
Vincent Tran
Posted Wed February 23, 2022 02:00 PM

Reply
Can a data fabric be used with data in motion / realtime streaming data? Many demos seem to focus on extracts or databases. For example, collecting sensor data from devices at a hospital.

Can a data fabric (CP4D) be an easy way for clients to consolidate data from internal and external sources with minimal programming/development? For example, I want to call a REST API hourly to get the current weather an store that data somewhere.

------------------------------
Vincent Tran
------------------------------

Original Message
24. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Fri February 25, 2022 12:09 PM

Reply
Yes a data fabric architecture would support what you are describing, whether it is data in motion, or consolidating internal / external data. A data fabric should have a range of capabilities to support any style of integration (ETL, event-driven, real-time data, data virtualization, etc.). As far as "minimal" programming is concerned, it's hard to say what would be "minimal" even though APIs are available - this would require a deeper discussion.

------------------------------
Oliver Claude
------------------------------

Original Message
25. RE: Questions for AMA: Data Fabric

Like
Joe Dreyer
Posted Thu February 24, 2022 01:08 AM

Reply
Does Graph Technologies play a role in the Data Fabric offering?

------------------------------
Joe Dreyer
------------------------------

Original Message
26. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:04 PM

Reply
Graph technologies can be and are used for semantic search, lineage, C360 and other use cases for Data Fabric.

------------------------------
Martin Oberhofer
------------------------------

Original Message
27. RE: Questions for AMA: Data Fabric

Like
Barbara Schramm
Posted Thu February 24, 2022 11:17 AM
Edited by System Admin Fri January 20, 2023 04:40 PM

Reply
Some of the capabilities (services) in the Data Fabric solutions are not available in all regions. What is the effect of regional availability of services on a Data Fabric deployment?

------------------------------
Barbara Schramm
Content Designer Cloud Pak for Data as a Service
------------------------------

Original Message
28. RE: Questions for AMA: Data Fabric

Like
Trent Gray Donald
Posted Thu February 24, 2022 02:13 PM

Reply
Data gravity is important to consider in any data-centric discussion (including Data Fabrics). So availability of services in a region should be a key consideration to any solution.

------------------------------
Trent Gray Donald
Distinguished Engineer
IBM
Ottawa ON
------------------------------

Original Message
29. RE: Questions for AMA: Data Fabric

Like
Vincent Tran
Posted Thu February 24, 2022 12:03 PM

Reply
Are there any opensource data fabrics or opensource stacks that provide data fabric functionality?

------------------------------
Vincent Tran
------------------------------

Original Message
30. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:05 PM

Reply
Hello Vincent,

There are many open source components which can be used for Data Fabric use cases, e.g. lucene technology as an element in semantic search. Or ontology reasoning libraries for query expansion in semantic search. And these are just a few point examples, there are many more.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
31. RE: Questions for AMA: Data Fabric

Like
David Lebutsch
Posted Thu February 24, 2022 02:12 PM

Reply
IBM Data Fabric leverages many opensource projects such as https://fybrik.io/ or https://arrow.apache.org/ . As with all opensource integration and assembly is required to fulfil a purpose.

------------------------------
David Lebutsch
------------------------------

Original Message
32. RE: Questions for AMA: Data Fabric

Like
Vincent Tran
Posted Thu February 24, 2022 12:29 PM

Reply
Does CP4D provide users the ability to label their data - and for administrators to manage users specifically brought in to label datasets?

Does CP4D manage row level security where you want to limit access to only certain rows of a dataset (based on conditions/rules)?

Does CP4D provide a way to view audit history. Who did what with what data? Who saw what?

------------------------------
Vincent Tran
------------------------------

Original Message
33. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:07 PM

Reply
Hello Vincent,

related to the first question:
Yes, with the enterprise data governance catalog users can label data assets in different ways, e.g. via tagging or via business term assignment (either manual or with ML automation) to mention just 2 examples.

related to your second question:
Yes, the policy enforcement engine is being enhanced for row level enforcement.

related to your third question:
There are some audit trails available, e.g. with the AI Factsheet you can see who trained the data model and on which asset it was trained on, etc. If you execute bulk operations (e.g. bulk data movement with DataStage) you can see the job execution time, etc. as well.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
34. RE: Questions for AMA: Data Fabric

Like
Tian Cai
Posted Thu February 24, 2022 01:44 PM

Reply
Hi CP4D team, a number of key questions on Data Fabric:

1. What is the implementation service required behind setting Data Fabric for an organization? I can't fathom this to be a plug-and-play solution.

2. If we leverage data virtualization in Data Fabric, wouldn't we be pushing OLAP analytics computing burden back onto the data source systems, especially the ones that are transactional OTLP systems? What do we say about this issue regarding to Data Fabric, or the data virtualization aspect behind it?

3. There are a lot of "automations" mentioned in the marketing material. Can we hear something a bit deeper about how data governance automation is achieved?

------------------------------
Tian Cai
------------------------------

Original Message
35. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:16 PM

Reply
Hello Tian,

related to your first question:

Data Fabric is an enterprise information architecture approach to manage data assets at scale with governance and lifecycle support. The business value is democratization of data with self-service features, industrialization features for AI for the e2e lifecycle of AI models, etc.

On the technical side, this requires an integrated set of capabilities and depending on use case you would deploy a first set of capabilities you need and grow from there. For example, you can start with Watson Knowledge Catalog and Watson Studio for an industrialized, trustworthy AI use case.

related to your second question:
Data virtualization can be used for exploration where a business analyst or Data Scientist can get a first impression of insights which can be driven across data sources. If in-depth analytics on very large operational data system with high concurrency throughput are needed, then to avoid contention with the operational workload requirements, you might need to move the data to a lake house for in-depth analytics. However, where its possible to use data virtualization, cost of redundant data copies can be avoided.

related to your third question:
One example is metadata import at scale using the metadata bulk import capabilities allowing you to ingest large volumes of metadata with a few mouse clicks reducing manual labor hours significantly.
One example is machine-learning infused term assignment which can be used to assign business terms across very large amount of data assets. Based on confidence thresholds you can decide when the term is auto-assigned vs. when you want a human review and approval step.
Another example is the automated creation of a AI Factsheets which shows you how trained the model, on which data was it trained, key quality metrics of the model, where it is deployed, etc. This metadata is captured automatically across the entire AI model lifecycle.
And this are just a few examples.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
36. RE: Questions for AMA: Data Fabric

Like
David Lebutsch
Posted Thu February 24, 2022 02:32 PM

Reply
Hi Tian, adding to what Martin said on your second question: Data Virtualization does quite a bit of 'smart' caching which can reduce the load on transactional systems. If caching doesn't yield the outcomes required replication or change data capture techniques are frequently used to create a nearline copy of the data for analytical queries.

------------------------------
David Lebutsch
------------------------------

Original Message
37. RE: Questions for AMA: Data Fabric

Like
Tian Cai
Posted Thu February 24, 2022 01:44 PM

Reply
Can I assume that IBM's CloudPak for Data - Data Fabric tools mainly consists of two functionalities:
1. Data virtualization technology (powered by, well, data virtualization) - for integrating disparate data silos;
2. Data cataloguing technology (powered by Knowledge Studio) - for looking up data sources?

------------------------------
Tian Cai
------------------------------

Original Message
38. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:23 PM

Reply
Hello Tian,

that assumption is wrong. IBM Data Fabric on Cloud Pak for Data has many more capabilities. e.g.

IBM Match 360 with Watson: This capability allows you to seamless determine a C360 on your customer data which can be consumed for downstream analytics like next best offer, next best offer, social influencer scoring, customer churn prediction and many more.

IBM Data Stage & Replication: This is a Hybrid Multi-Cloud data integration and movement capability providing you batch, replication and real-time data movement & transformation capabilities.

The data ingestion & persistency layer for Data Fabric has many different types of options available reaching from Cloud Object Storage, relational databases, NoSQL databases, open source databases, etc.

Watson Studio / Watson ML / Watson OpenScale / Watson Discovery: This a family of AI related capabilities allowing Data Scientists to build solutions for structured and unstructured data optimizing business processes with the infusion of AI and managing the model life cycle end to end.

Watson Assistant: This is a customer care solution allowing the seamless and quick deployment of chatbot solutions.

And this is just the tip of the iceberg. There are many more.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
39. RE: Questions for AMA: Data Fabric

Like
Tian Cai
Posted Thu February 24, 2022 01:49 PM

Reply
What AI is embedded in the CP4D Data Fabric system? For example would it automatically detect personal info like email address, social security number, and credit card numbers and mask them by default?

What would be some other AI capabilities?

------------------------------
Tian Cai
------------------------------

Original Message
40. RE: Questions for AMA: Data Fabric

Like
Martin Oberhofer
Posted Thu February 24, 2022 02:26 PM

Reply
Hello Tian,

the CP4D Data Fabric solution has capabilities to detect PII / PHI / SPI information and classify data assets accordingly. These are functions available through the Watson Knowledge Catalog. In addition, there is a rich set of masking functions available (e.g. format preserving encryption, etc.) to protect the values in such fields if required (e.g. the user trying to access it not having the privilege to do so). Furthermore, whenever there is a data asset flagged as containing PII with Watson Open Pages, the Chief Privacy Officer has an automated solution to produce the privacy compliance reports needed to reporting to regulatory authorities enforcing privacy law compliance requirements like GDPR, CCPA, etc.

Kind regards,

Martin

------------------------------
Martin Oberhofer
------------------------------

Original Message
41. RE: Questions for AMA: Data Fabric

Like
Hisham Ghanem
Posted Thu February 24, 2022 02:57 PM
Edited by System Admin Fri January 20, 2023 04:25 PM

Reply
I am new to Data Fabric. Interested in getting involved and contributing. Looking for means to do so.

Reading a bit about this topic, it seems to be heavily dependent on governance among other data integration and processing technologies.
With the WKC, connection platform, Data Virtualization and Refinery, etc... currently offered in CP4D, what services are missing to implement a Data Fabric solution for our customers? or are all required services are there and only need to package them properly?

Thanks

------------------------------
Hisham Ghanem
Architect
IBM
Vienna VA
571-4216350
------------------------------

Original Message
42. RE: Questions for AMA: Data Fabric

Like
David Lebutsch
Posted Thu February 24, 2022 02:58 PM

Reply
Hi Hisham, it is about the integration and the packaging but also additional capabilities to enforce governed access to data in or near the data source. https://fybrik.io/v0.6/concepts/architecture/ offers a good opensource architectural illustration, look at the Data Access Modules.

------------------------------
David Lebutsch
------------------------------

Original Message
43. RE: Questions for AMA: Data Fabric

Like
KEITH DOAN
Posted Thu February 24, 2022 04:03 PM

Reply
Dear Panel,

At the conceptual level, there is a discussion about data centric vs data driven approaches.

Data centric refers to "an architecture where data is the primary and permanent asset, and applications come and go. In the data centric architecture, the data model precedes the implementation of any given application and will be around and valid long after it is gone."

Data driven is about "building tools, abilities, and, most crucially, a culture that acts on data."
(https://tdan.com/the-data-centric-revolution-data-centric-vs-data-driven/20288).

Does Data Fabric lead to a data centric vision and if yes, how does it do it?

Thanks.

------------------------------
KEITH DOAN
------------------------------

Original Message
44. RE: Questions for AMA: Data Fabric

Like
Oliver Claude
Posted Mon February 28, 2022 10:04 PM

Reply
I'm not sure if you can completely decouple data from "applications" in the sense that data would have to have a context that gives it meaning. Now the question is whether the context maintains a level of independence if the actual application (e.g., CRM application, Analytics dashboard, etc.) "comes and goes." That's a tough question to answer because the data was created by the application in a particular context which isn't not fully encapsulated in the metadata. Even if it could be, it would represent a point-in-time state, and that state could only be changed by external business logic or human engineering.

That being said, data could be an asset in the sense that a data set could be created and assigned an "identity" and start a life outside the application that created it so to speak. For example, 3rd-party data like D&B, a curated customer list, etc. could be data "products" that have inherent value to be re-used in new contexts in conjunction with new "applications."

So from that perspective, the data fabric needs to support both data-centric and data-driven concepts. For example, the data fabric enables data-centricity with the data catalog component which provides an inventory of data assets, but it also enables a data-driven culture through the use of these data assets in the "applications" that derive value from the data.

------------------------------
Oliver Claude
------------------------------

Original Message
45. RE: Questions for AMA: Data Fabric

Like
KEITH DOAN
Posted Mon February 28, 2022 10:47 PM

Reply
Thank you Olivier. As data is created from a particular context by application, the level of application independence in a data architecture could be to do with a common industry data model, or as you suggested, data products that have inherent value to be re-used in new contexts in conjunction with new "applications." In data fabric sense, augmented knowledge, a unified view of metadata and master data, seems to be the place for this.

------------------------------
KEITH DOAN
------------------------------

Original Message

Cloud Pak for Data

Cloud Pak for Data

Questions for AMA: Data Fabric

Shannon RouillerFri February 11, 2022 12:37 PM

Joe DreyerTue February 15, 2022 02:30 PM

Oliver ClaudeThu February 24, 2022 02:08 PM

Anonymous MemberThu February 17, 2022 12:05 PM

Anonymous MemberFri February 18, 2022 12:22 PM

John Chaves ChavesFri February 18, 2022 12:25 PM

Oliver ClaudeThu February 24, 2022 02:57 PM

Martin OberhoferThu February 24, 2022 02:58 PM

CHINWE VIVIAN ONONIWUFri February 18, 2022 06:14 AM

Dejan GlozicThu February 24, 2022 02:50 PM

Martin OberhoferThu February 24, 2022 02:55 PM

Jennifer Smith GraySun February 20, 2022 02:40 PM

Oliver ClaudeThu February 24, 2022 02:33 PM

Polya MarkovaMon February 21, 2022 04:27 AM

David LebutschThu February 24, 2022 02:28 PM

Oliver ClaudeThu February 24, 2022 02:41 PM

Hema JagadeeshanMon February 21, 2022 02:58 PM

David LebutschThu February 24, 2022 02:19 PM

Trish SmithTue February 22, 2022 01:00 PM

Martin OberhoferThu February 24, 2022 02:45 PM

VICTOR JIMENEZ SANCHEZWed February 23, 2022 12:35 PM

Martin OberhoferThu February 24, 2022 02:36 PM

Vincent TranWed February 23, 2022 02:00 PM

Oliver ClaudeFri February 25, 2022 12:09 PM

Joe DreyerThu February 24, 2022 01:08 AM

Martin OberhoferThu February 24, 2022 02:04 PM

Barbara SchrammThu February 24, 2022 11:17 AM

Trent Gray DonaldThu February 24, 2022 02:13 PM

Vincent TranThu February 24, 2022 12:03 PM

Martin OberhoferThu February 24, 2022 02:05 PM

David LebutschThu February 24, 2022 02:12 PM

Vincent TranThu February 24, 2022 12:29 PM

Martin OberhoferThu February 24, 2022 02:07 PM

Tian CaiThu February 24, 2022 01:44 PM

Martin OberhoferThu February 24, 2022 02:16 PM

David LebutschThu February 24, 2022 02:32 PM

Tian CaiThu February 24, 2022 01:44 PM

Martin OberhoferThu February 24, 2022 02:23 PM

Tian CaiThu February 24, 2022 01:49 PM

Martin OberhoferThu February 24, 2022 02:26 PM

Hisham GhanemThu February 24, 2022 02:57 PM

David LebutschThu February 24, 2022 02:58 PM

KEITH DOANThu February 24, 2022 04:03 PM

Oliver ClaudeMon February 28, 2022 10:04 PM

KEITH DOANMon February 28, 2022 10:47 PM

1. Questions for AMA: Data Fabric

2. RE: Questions for AMA: Data Fabric

3. RE: Questions for AMA: Data Fabric

4. RE: Questions for AMA: Data Fabric

5. RE: Questions for AMA: Data Fabric

6. RE: Questions for AMA: Data Fabric

7. RE: Questions for AMA: Data Fabric

8. RE: Questions for AMA: Data Fabric

9. RE: Questions for AMA: Data Fabric

10. RE: Questions for AMA: Data Fabric

11. RE: Questions for AMA: Data Fabric

12. RE: Questions for AMA: Data Fabric

13. RE: Questions for AMA: Data Fabric

14. RE: Questions for AMA: Data Fabric

15. RE: Questions for AMA: Data Fabric

16. RE: Questions for AMA: Data Fabric

17. RE: Questions for AMA: Data Fabric

18. RE: Questions for AMA: Data Fabric

19. RE: Questions for AMA: Data Fabric

20. RE: Questions for AMA: Data Fabric

21. RE: Questions for AMA: Data Fabric

22. RE: Questions for AMA: Data Fabric

23. RE: Questions for AMA: Data Fabric

24. RE: Questions for AMA: Data Fabric

25. RE: Questions for AMA: Data Fabric

26. RE: Questions for AMA: Data Fabric

27. RE: Questions for AMA: Data Fabric

28. RE: Questions for AMA: Data Fabric

29. RE: Questions for AMA: Data Fabric

30. RE: Questions for AMA: Data Fabric

31. RE: Questions for AMA: Data Fabric

32. RE: Questions for AMA: Data Fabric

Additional
Resources