watsonx.data

watsonx.data

Put your data to work, wherever it resides, with the hybrid, open data lakehouse for AI and analytics

 View Only

Sync watsonx.data-managed Iceberg Tables in Snowflake Using the Iceberg Open Specification REST Implementation in MDS - SaaS

By Hemant Marve posted 29 days ago

  

A read-only external catalog can be created in Snowflake and integrated with the Iceberg REST Catalog implementation based on the open specification. The Iceberg REST Catalog implementation in watsonx.data Metadata Service (MDS) enables syncing Iceberg tables from watsonx.data to Snowflake. 

A time interval to refresh the synced tables with new data & metadata can also be provisioned. This will help keep the data and metadata updated at required intervals.

Complete the following steps to sync watsonx.data managed Iceberg Tables in Snowflake:

Pre-requisites:

·      Set up watsonx.data

·      Set up Snowflake

 The following image provides an illustration of the syncing of an Iceberg table from watsonx.data with Snowflake


Image 1: Illustration of syncing of an Iceberg table from watsonx.data with Snowflake

1.        Create an Iceberg table in watsonx.data. You can use the Presto engine or create a table through data ingestion or Spark also.


Image 2: Query workspace in watsonx.data

2.        In Snowflake, create an external volume point to the bucket in the watsonx.data Iceberg catalog.

CREATE OR REPLACE EXTERNAL VOLUME <my_external_volume>
   STORAGE_LOCATIONS =
      (
         (
            NAME = '<bucket-name>'
            STORAGE_PROVIDER = 'S3COMPAT'
            STORAGE_BASE_URL = 'S3COMPAT://<bucket-name>/'
            CREDENTIALS = (
                AWS_KEY_ID = '<access-key>'
                AWS_SECRET_KEY = '<secret-key>'
                )
            STORAGE_ENDPOINT='<endpoint>'
         )
      );

The following image shows an example of the command:


Image 3: Example of the external volume in Snowflake connected with the bucket in watsonx.data

3.        In Snowflake, create a read-only external catalog integration with the MDS Iceberg REST Catalog. For more details, you can check out the blog Set up AWS gateway as proxy for MDS-REST endpoint URL.

CREATE OR REPLACE CATALOG INTEGRATION <external-catalog-integration-name>
  CATALOG_SOURCE = POLARIS
  TABLE_FORMAT = ICEBERG
  CATALOG_NAMESPACE = '<wxd schema or namespace>'
  REST_CONFIG = (
    CATALOG_URI = 'https://<domain>/mds/iceberg'
    WAREHOUSE = '<wxd-catalog>'
  )
  REST_AUTHENTICATION = (
    TYPE = BEARER
    BEARER_TOKEN = '<wxd-bearer-token>'
  )
  ENABLED = TRUE
  REFRESH_INTERVAL_SECONDS = 60;

The following image shows an example of the integration command:


Image 3: Example of catalog integration with watsonx.data iceberg_data_catalog

4.        Sync a watsonx.data Iceberg table by using the external catalog integration.

CREATE OR REPLACE ICEBERG TABLE <snowflake-side-table-name>
  EXTERNAL_VOLUME = '<my_external_volume>'
  CATALOG = '<external-catalog-integration-name>'
  CATALOG_TABLE_NAME = '<wxd-iceberg-table-name>'
  CATALOG_NAMESPACE = '<wxd namespace or schema>'
  AUTO_REFRESH = TRUE;

Set AUTO_REFRESH to FALSE if you don't want the tables to be automatically synced after the specified REFRESH_INTERVAL_SECONDS.

The following image shows an example of the syncing:


Image 4: Example of table syncing through catalog integration

5.        Test the Iceberg table sync.

SELECT * FROM <snowflake-side-table-name>;

The following image shows the test results


Image 5: Iceberg table synced in snowflake from WXD

6.        Update the Iceberg table in watsonx.data. For example, insert new data or change metadata.

7.        In Snowflake, check if the data and metadata are refreshed automatically.

After the update on the WXD side, Metadata and data in Snowflake side table automatically refreshed after a specified REFRESH_INTERVAL_SECONDS.

The following image shows that data and metadata in Snowflake are refreshed automatically after the specified REFRESH_INTERVAL_SECONDS


Image 6: Automatic data refresh in Snowflake


#watsonx.data
0 comments
8 views

Permalink