watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

Support for Unity Catalog API & Iceberg REST Catalog API in IBM watsonx.data

By MRUDULA MADIRAJU posted 17 days ago

  

In this blog post we will look at the details of how Unity Catalog API and Iceberg REST Catalog API are supported in IBM watsonx.data.

Unity Catalog API is Databrick’s data governance solution for access control, auditing, lineage tracking, and data discovery. Databricks’ move to open-source unity catalog is welcomed in the industry for bringing in the benefits of openness, flexibility and interoperability.

Apache Iceberg’s REST Catalog API offers RESTful endpoints to manage tables, schemas, and related metadata. It can be used for remote catalog operations in a distributed data environment.

How are these open-source catalog APIs supported in watsonx.data?

The Metadata Service (MDS) within watsonx.data implements selected APIs from the Unity Catalog Open API specification and Iceberg REST Catalog Open API specification.

This integration enables external systems and applications to interact with watsonx.data's metadata repository for consistent metadata management and facilitate data operations over distributed data systems.

Iceberg Rest API in watsonx.data

While MDS will continue to support the Apache Hive Metastore API through the org.apache.iceberg.hive.HiveCatalog class in Apache Iceberg, you can now take advantage of the org.apache.iceberg.rest.RESTCatalog class to leverage the new capabilities. 

So functionally, what exactly does REST Catalog get you that is not there in HiveCatalog? Out of 24 APIs in the 1.6.1 spec of REST catalog, the most interesting are the following:

/v1/{prefix}/namespaces/{namespace}/register - Register table using existing metadata location
/v1/{prefix}/namespaces/{namespace}/tables - Create table
/v1/{prefix}/namespaces/{namespace}/tables/{table} - Commit Updates to a table
/v1/{prefix}/namespaces/{namespace}/tables/{table} - Get the metdata of the table
+
/v1/{prefix}/transactions/commit - Commit Updates to multiple tables in an atomic transaction

NOTE: The COMMIT API leads to interesting possibilities that were not there earlier.


To explore how to use the MDS Iceberg REST Catalog implementation in watsonx.data for seamless table management and operations, see the following blogs:

·      Iceberg Rest API on watsonx.data: Java Client PART-1

·      Iceberg Rest API on watsonx.data: Java Client PART-2

·      Iceberg Rest API on watsonx.data: Java Client PART-3 — Appending Data Files

Interoperability with ecosystem - watsonx.data Iceberg REST API

In this section, let us see how the Iceberg REST API implementation in watsonx.data interoperates with different applications in the ecosystem.

Connecting to watsonx.data from Apache Spark

Connecting to Spark through the Iceberg REST Catalog via the HTTPs interface for metadata operations is similar to connecting with Spark through the Hive Metastore (HMS) interface via Thrift protocol. Service providers of this REST catalog implementation have the flexibility to decide how the metadata layer is implemented — be it a DB, HMS or some custom implementation.

If you have Apache Spark system outside of watsonx.data, such as Analytics Engine, or any other system, you can execute your Spark applications against metadata in watsonx.data. For the metadata it connects through the Iceberg REST Catalog using a bearer or basic authentication.



For more details, see Connecting Apache Spark to watsonx.data using the Iceberg REST API.

Connecting to Snowflake Open Catalog from watsonx.data Spark

To execute Spark applications from watsonx.data against Iceberg tables in Snowflake Open Catalog, you must configure watsonx.data Spark to connect to the Iceberg REST Catalog in Snowflake.


For more details, see Connect to Snowflake Open Catalog from watsonx.data Spark.

Querying Iceberg data from watsonx.data tables in Snowflake

To import Iceberg tables in watsonx.data into Snowflake Catalog and run queries from there, you can use the Iceberg REST API in watsonx.data to integrate with Snowflake and bring the tables as "external" from watsonx.data into the Snowflake Catalog. For more details, see Sync watsonx.data managed Iceberg tables in Snowflake using the MDS Iceberg Open Specification REST implementation – SaaS.


Joining watsonx.data tables and Databricks table from notebook in Databricks

You can join the Delta table created in Databricks with the Iceberg table in watsonx.data using a compute in Databricks and a PySpark notebook. For more details, see Accessing Iceberg tables managed by watsonx.data using Databricks Spark.


Interoperability with ecosystem - watsonx.data Unity Catalog API

In this section, let us see how the Unity Catalog API implementation in watsonx.data interoperates with different applications in the ecosystem.

Working with Apache Spark

The Unity Catalog API support in watsonx.data provides the ability to execute Spark applications from any external Spark.

For more details, see IBM watsonx.data integration with Unity Catalog simplified.

Working with Databricks

You can connect to Databricks from Spark engine in watsonx.data and execute Spark SQL applications.


For more details along with a practical example, see Connecting Databricks from watsonx.data Spark engine using Unity Catalog open APIs.

Conclusion: The Beginning

The new API features implemented in watsonx.data is only the beginning of the story to promote open standards and interoperability in the ecosystem. As the space evolves with more features and functionality, it will only strengthen the case.

Acknowledging contributions from colleagues: Hemant, Anurag, Althaf, Anjali, Shivangi, Dixon. Thanks to guidance and support from Gopi & Kulki


#watsonx.data
0 comments
12 views

Permalink