watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

What Lies Beneath the Open Data Lakehouse

By SAMI SALKOSUO posted yesterday

  

It can be said that an open data lakehouse is a data management architecture built on open-source technologies—combining the flexibility of a data lake with the structure and performance of a data warehouse.

IBM watsonx.data is more than the architecture itself. It is a real-world, enterprise-grade implementation of a hybrid and open data lakehouse.

Because of this, we can create a solution that uses IBM watsonx.data, Snowflake, and AWS S3 to form a truly hybrid setup—one that delivers a tangible benefit.

In this article, I’ll walk through how IBM watsonx.data components can be used together with Snowflake and AWS S3, and demonstrate how to configure them to work in harmony.

Hybrid Open Data Lakehouse

The image below shows the components of IBM watsonx.data. In this article and demo, the components highlighted are those that make up our hybrid open data lakehouse solution.

watsonx.data components. Hybrid solution components highlighted.

These components together form the foundation of a hybrid, interoperable data platform:

  • 3rd Party Engines – Snowflake is integrated as the query engine.
  • Unified Metadata – The Iceberg REST API is used to access watsonx.data’s metadata service.
  • Open-Source Data Formats – Apache Iceberg and Apache Parquet ensure interoperability between different query engines.
  • AWS S3 – The underlying storage for all data.
  • Hybrid Infrastructure – Spanning multiple clouds and environments (see below).

These elements make the hybrid open data lakehouse not just an idea, but a reality.

Demo Scenario

Let’s imagine a use case where Snowflake serves as the query engine for users and applications, while IBM watsonx.data handles data ingestion, transformation, and writes to AWS S3 storage.

In the diagram, we can see how Apache Iceberg—the open table format—plays a central role. Iceberg enables watsonx.data to write and transform data stored in AWS S3, while allowing Snowflake to read the same data seamlessly.

This is what interoperability looks like in practice.

Truly a Hybrid Solution

In this demo, hybrid isn’t just a buzzword—it’s a fact. The environment spans multiple locations and clouds:

  • Snowflake runs in Azure (Netherlands).
  • AWS S3 resides in Germany.
  • IBM watsonx.data operates in a private cloud (United Kingdom).

Demo Video

In this (Finnish-language) demo video, I show how to integrate IBM watsonx.data and Snowflake using Apache Iceberg and AWS S3.

In the video, you’ll see:

  • Introduction to the demo scenario (00:04–00:32)
  • Configuration: watsonx.data and AWS S3 (00:33–01:09)
  • Configuration: Snowflake and AWS S3, with integration to watsonx.data catalog (01:10–01:29)
  • Creating an Iceberg table in watsonx.data (01:30–01:45)
  • Creating an Iceberg table in Snowflake (01:46–02:00)
  • Writing a row in watsonx.data (02:01–02:08)
  • Reading the same row in Snowflake (02:09–02:24)
  • Recap: watsonx.data components in action (02:25–02:54)
  • Recap: watsonx.data = Open Data Lakehouse (02:55–03:01)

In the video, you saw how IBM watsonx.data brings the open data lakehouse to life—using open-source Apache Iceberg as the enabler for integration between watsonx.data and Snowflake.

A Tangible Benefit

Looking again at the scenario overview, the benefit of this hybrid solution with IBM watsonx.data, Snowflake, and AWS S3 is so clear that it doesn’t need to be said. (Hint: smart architectures tend to pay off.)

Originally published at https://www.linkedin.com.


#watsonx.data

0 comments
1 view

Permalink