The interoperability between Databricks and watsonx.data, powered by the Spark engine, enables seamless Spark-based data access, metadata synchronization, and the application of Databricks governance policies. With this integration, organizations using Databricks can extend their governance framework to data stored in watsonx.data, ensuring consistent policy enforcement across platforms.
For external data access we need to enable Unity Catalog as well as to allow external engines to access data in a metastore, a metastore admin must enable external data access for the metastore. This option is disabled by default to prevent unauthorised external access.
Access from watsonx.data
The watsonx.data Spark engine retrieves data from a Databricks Unity Catalog (UC) metastore using a Databricks personal access token. UC provides temporary credentials and URLs that enable data retrieval and query execution.

Image 1: Access between watsonx.data and Unity Catalog
Provisioning watsonx.data native Spark engine
- Log in to watsonx.data console.
- From the navigation menu, select Infrastructure manager.
- To provision an engine, click Add component and select Add engine.
- Specify the storage volume that is considered as Engine home, which stores the Spark events and logs that are generated while running spark applications.
Setting up watsonx.data Spark lab
- Install a desktop version of Visual Studio Code.
- Install watsonx.data extension from VS Code Marketplace.
- Ensure that you have public-private SSH key pair to establish an SSH connection with the Spark lab.
- Install the extension Remote - SSH from Visual Studio Code marketplace.
- Create a Spark lab.
- To create a new Spark lab, click the + icon. The Create Spark Lab window opens. Specify your public SSH key and the public SSH keys of the users whom you want to grant access to Spark lab. Specify each public SSH key on a new line.
- Click Create. Click Refresh to see the Spark lab in the left window. This is the dedicated Spark cluster for application development.
- Open the Spark lab to access the file system, terminal, and work with it.
- In the Explorer window, you can view the file system, where you can upload the files, and view logs.
Accessing Databricks UC from the watsxon.data Spark lab
The following code is used to access UC using Pyspark. Create a Python file in any of the listed folders and add the following code.
Pre-reqs to access catalog from external engine
- Catalog must be created after enabling Unity catalog metastore.
- Enable External data access for the metastore
- Enable EXTERNAL USE SCHEMA for catalog or schema
- Personal access token (PAT) of Databricks workspace
- Access keys of the storage account where the UC is configured.
from pyspark.sql import SparkSession
import os
def init_spark():
spark = SparkSession.builder.appName("data-test") \
.config("spark.jars.packages", "org.apache.hadoop:hadoop-aws:3.3.4,org.apache.hadoop:hadoop-common:3.3.4,io.delta:delta-spark_2.12:3.2.1,io.unitycatalog:unitycatalog-spark_2.12:0.2.0") \
.config("spark.sql.catalog.spark_catalog", "io.unitycatalog.spark.UCSingleCatalog") \
.config("spark.sql.catalog.<UC Catalog>", "io.unitycatalog.spark.UCSingleCatalog") \
.config("spark.hadoop.fs.s3.impl", "org.apache.hadoop.fs.s3a.S3AFileSystem") \
.config("spark.sql.catalog.<UC Catalog>.uri", "https://<Databricks Workspace URL>/api/2.1/unity-catalog") \
.config("spark.sql.catalog.<UC Catalog>.token", "<Databricks Workspace PAT>") \
.config("spark.sql.defaultCatalog", "<UC Catalog>") \
.config("fs.azure.account.key.<storage-account>.dfs.core.windows.net", "<UC storage account access key>") \
.getOrCreate()
return spark
def create_database(spark,catalog,databasename):
spark.sql(f"create database if not exists {catalog}.{databasename}")
def list_databases(spark,catalog):
spark.sql(f"SHOW SCHEMAS").show()
def basic_iceberg_table_operations(spark,catalog,databasename):
# demonstration: Create a basic table, insert some data and then query table
# print("creating table") # Create table is restricted currently and need databricks support to enale it
# spark.sql(f"create table if not exists {catalog}.{databasename}.testTable(id INTEGER, name VARCHAR(10), age INTEGER)").show()
print("table created") # Currently Alter table is not supported in UC
# spark.sql(f"ALTER TABLE {catalog}.{databasename}.testTable ADD COLUMNS (salary DECIMAL(10, 2))").show()
print("table altered")
spark.sql(f"insert into {catalog}.{databasename}.testTable values(1,'Alan',23,3400.00),(2,'Ben',30,5500.00),(3,'Chen',35,6500.00)")
print("data inserted")
spark.sql(f"select * from {catalog}.{databasename}.testTable").show()
def clean_database(spark,catalog,databasename):
# clean-up the demo database
spark.sql(f'drop table if exists {catalog}.{databasename}.testTable purge')
spark.sql(f'drop database if exists {catalog}.{databasename} cascade')
def view_data(spark):
spark.sql(f'select * from ams_test.test.employees').show()
def main():
try:
spark = init_spark()
list_databases(spark,"data_poc")
view_data(spark)
create_database(spark,"data_poc","dischema")
list_databases(spark,"data_poc")
basic_iceberg_table_operations(spark,"data_poc","test")
finally:
# clean-up the demo database
clean_database(spark,"data_poc","dischema")
spark.stop()
if __name__ == '__main__':
main()
- <UC Catalog> — The Databricks catalog to which you have access to run DDL/DML statements.
- <Databricks Workspace URL> — The URL through to access the Databricks UC workspace.
- <Databricks Workspace PAT> — The Databricks workspace personal access token (PAT), which is used to authenticate the user to the Databricks platform.
- <storage-account> — The cloud storage account name.
- <UC storage account access key> — The cloud storage account access key.
The following additional parameters need to be added to the configuration in order to connect to AWS S3 buckets.
.config("spark.hadoop.fs.s3a.bucket.<Databricks Bucket>.endpoint", <AWS Object Store URL>) \
.config("spark.hadoop.fs.s3a.bucket.<Databricks Bucket>.access.key", "<AWS Access Key>") \
.config("spark.hadoop.fs.s3a.bucket.<Databricks Bucket>.secret.key", "<AWS Secret Key>") \
NOTE: Certain SQL statements, such as CREATE TABLE, may not function due to the limitations and restrictions of the Unity Catalog Spark JAR. Additionally, when the EXTERNAL_USE_SCHEMA permission is granted for access from an external engine, other permissions—such as SELECT—are not being validated.
Reference
Unity Catalog API & Iceberg REST Catalog API in watsonx.data
#watsonx.data