Data Management Global

 View Only

Technical Demo: IBM CloudPak for Data and Cloudera DataFlow, Better Together

By Robert Stanich posted Mon May 23, 2022 11:09 AM


IBM and Cloudera announced today the next step in their strategic partnership to bring advanced data and AI solutions to more organizations with deeper integration between Cloudera DataFlow (CDF) and IBM Cloud Pak for Data. For additional information on these integrations, please see my previous blog post here.

CDF is a scalable, real-time streaming data service that ingests, curates, and analyzes data for key insights and immediate actionable intelligence that addresses the following challenges:

  • Processing real-time data streaming at high volume and high scale
  • Tracking data provenance and lineage of streaming data
  • Managing and monitoring edge applications and streaming sources
  • Gaining real-time insights and actionable intelligence from streaming data

In this video demo Cloudera Software Engineer Andrew Lim walks us through a streaming data scenario centered around a real-time credit risk decision. NiFi, a component of CDF which is monitoring a flow of incoming customer data, will call out to a machine learning model running on IBM's Watson Machine Learning Service (WML) to receive a credit risk prediction. That model was previously trained from historical data resident in a repository such as Hadoop with the aid of IBM Watson Studio. At the end of the sequence, Andrew updates his enriched customer record, complete with the credit risk prediction, to Kafka.

In our demo, we are running Cloudera Data Flow Public Cloud on Amazon Web Services, while the customer data, simulating a stream of data flowing through a credit risk application, is being read from IBM Cloud. The Watson Machine Learning service is also running on IBM Cloud, both using the IAM Service to authenticate. This was done for the convenience our teams that constructed the demo, but is a great example of how real-world organizations can take advantage of multiple clouds simultaneously.

IBM helps organizations simplify management of the cloud data ecosystems. Cloudera is one of the top partners empowering businesses with a one stop shop at IBM through the OEM agreement.

You can visit our websites on IBM-Cloudera partnership and Cloudera Data Platform with IBM to learn more. You can learn more about Cloudera DataFlow and IBM Cloud Pak for Data.  You can also book an expert consultation there.

For more details, please visit IBM Cloud Pak for Data, IBM Data Fabric, and Cloudera Data Platform or join the Cloud Pak for Data Community.

CLICK HERE for a roundup of recent news about the IBM and Cloudera relationship.