Data Integration

Connect with experts and peers to elevate technical expertise, solve problems and share insights.

View Only

Back to Blog List

Announcing the watsonx.data integration Python SDK 1.0.0 Tech Preview Release

By John Wen posted 28 days ago

We are excited to announce the tech preview release of the watsonx.data integration Python SDK 1.0.0. This release brings together batch and streaming data integration capabilities into a single Python SDK, making it easier than ever to build, manage, and automate your pipelines programmatically.

With this release, data engineers and developers can work seamlessly across batch flows processing (DataStage) and streaming flows (StreamSets) all without leaving Python.

What’s New in Batch (DataStage)

Flow building and management

You can now create, update, and organize batch data flows entirely in Python. This provides programmatic control over your pipelines, making it easier to automate and scale data integration workflows.

Example: Create and run a flow

from ibm_watsonx_data_integration.common.auth import IAMAuthenticator
from ibm_watsonx_data_integration import Platform
from ibm_watsonx_data_integration.services.datastage import *

auth = IAMAuthenticator(api_key='API_key') ## ADD API KEY HERE
platform = Platform(auth, base_api_url='https://api.ca-tor.dai.cloud.ibm.com') 
project = platform.projects.get(name = 'My project') ## CHANGE PROJECT NAME TO WORKING PROJECT NAME


# Flow
flow = project.create_flow(
    name="RowGenPeek",
    environment=None,
    flow_type="datastage"
)

# Stages
row_generator = flow.add_stage("Row Generator", "Row_Generator")
row_generator.configuration.runtime_column_propagation = False

peek = flow.add_stage("Peek", "Peek")
peek.configuration.runtime_column_propagation = False

# Graph
link_1 = row_generator.connect_output_to(peek)
link_1.name = "Link_1"

row_generator_schema = link_1.create_schema()
row_generator_schema.add_field("VARCHAR", "COLUMN_1").length(100)

project.update_flow(flow)

Job support

Manage the lifecycle of your batch jobs directly in Python. Start, monitor, and manage jobs programmatically.

Example: Start and monitor a job

#Run this section after the flow creation code above.
row_gen_peek_job = project.create_job(name="RowGenPeek_job", flow=flow)
job_start = row_gen_peek_job.start(name="RowGenPeek_job_run", description="")

print(f"Job name: {job_start.job_name}") 
print(f"Job state: {job_start.state}")

Python SDK flow generator (Batch)

Accelerate development by generating Python SDK code from existing flows. This helps teams learn faster and bootstrap new pipelines with minimal effort.

Example: Generate Python code from a flow

from ibm_watsonx_data_integration.services.datastage.codegen import PythonGenerator

code_gen = PythonGenerator()
code_gen.configuration.mode = "file_per_flow"
code_gen.generate(input_path="RowGenPeek.zip", output_path="generated_code")

What’s new in streaming (StreamSets)

Validate streaming flows

Easily validate your streaming flows with the SDK to ensure configurations are correct before deployment. (Code example skipped here since you will provide your own.)

Example: Validate existing flows

streaming_flow = project.flows.get(name = "Streaming fraud detection")
errors = streaming_flow.validate()

print(errors)

Python SDK flow generator (Streaming)

Like batch, the streaming side now supports automatic Python code generation from existing flows, accelerating development and automation.

Example: Generate Python code for a streaming flow

from ibm_watsonx_data_integration.codegen import PythonGenerator

generator = PythonGenerator(
    source=flow,
    destination="Desktop/streaming_flow_template.py", ##Make sure path is set correctly
    auth=auth,
    base_api_url="https://api.ca-tor.dai.cloud.ibm.com"
)
generator.save()

Why this matters

With the unified Python SDK, developers now have:

A single consistent interface for batch and streaming pipelines
Full programmatic control of flow creation, validation, and job execution
Automation capabilities through flow-to-code generation

This tech preview is the first step in unifying how you work with data integration across watsonx.data integration.

Try It Today

Join the beta today and get early access on demo videos and how to get started with watsonx.data integration. To install the package run the following command in your terminal. Documentation will be added in the coming days.

pip install ibm-watsonx-data-integration

0 comments

78 views

Permalink

https://community.ibm.com/community/user/blogs/john-wen/2025/09/30/announcing-the-watsonxdata-integration-unified-pyt

Data Integration

Data Integration

Announcing the watsonx.data integration Python SDK 1.0.0 Tech Preview Release

By John Wen posted 28 days ago

What’s New in Batch (DataStage)

Flow building and management

Job support

Python SDK flow generator (Batch)

What’s new in streaming (StreamSets)

Validate streaming flows

Python SDK flow generator (Streaming)

Why this matters

Try It Today

Permalink

Additional
Resources

Office

Quick Links

Data Integration

Data Integration

Announcing the watsonx.data integration Python SDK 1.0.0 Tech Preview Release

By John Wen posted 28 days ago

What’s New in Batch (DataStage)

Flow building and management

Job support

Python SDK flow generator (Batch)

What’s new in streaming (StreamSets)

Validate streaming flows

Python SDK flow generator (Streaming)

Why this matters

Try It Today

Permalink

Additional Resources

Office

Quick Links

Additional
Resources