We are excited to announce the tech preview release of the watsonx.data integration Python SDK 1.0.0. This release brings together batch and streaming data integration capabilities into a single Python SDK, making it easier than ever to build, manage, and automate your pipelines programmatically.
With this release, data engineers and developers can work seamlessly across batch flows processing (DataStage) and streaming flows (StreamSets) all without leaving Python.
What’s New in Batch (DataStage)
Flow building and management
You can now create, update, and organize batch data flows entirely in Python. This provides programmatic control over your pipelines, making it easier to automate and scale data integration workflows.
Example: Create and run a flow
from ibm_watsonx_data_integration.common.auth import IAMAuthenticator
from ibm_watsonx_data_integration import Platform
from ibm_watsonx_data_integration.services.datastage import *
auth = IAMAuthenticator(api_key='API_key') ## ADD API KEY HERE
platform = Platform(auth, base_api_url='https://api.ca-tor.dai.cloud.ibm.com')
project = platform.projects.get(name = 'My project') ## CHANGE PROJECT NAME TO WORKING PROJECT NAME
# Flow
flow = project.create_flow(
name="RowGenPeek",
environment=None,
flow_type="datastage"
)
# Stages
row_generator = flow.add_stage("Row Generator", "Row_Generator")
row_generator.configuration.runtime_column_propagation = False
peek = flow.add_stage("Peek", "Peek")
peek.configuration.runtime_column_propagation = False
# Graph
link_1 = row_generator.connect_output_to(peek)
link_1.name = "Link_1"
row_generator_schema = link_1.create_schema()
row_generator_schema.add_field("VARCHAR", "COLUMN_1").length(100)
project.update_flow(flow)
Job support
Manage the lifecycle of your batch jobs directly in Python. Start, monitor, and manage jobs programmatically.
Example: Start and monitor a job
#Run this section after the flow creation code above.
row_gen_peek_job = project.create_job(name="RowGenPeek_job", flow=flow)
job_start = row_gen_peek_job.start(name="RowGenPeek_job_run", description="")
print(f"Job name: {job_start.job_name}")
print(f"Job state: {job_start.state}")
Python SDK flow generator (Batch)
Accelerate development by generating Python SDK code from existing flows. This helps teams learn faster and bootstrap new pipelines with minimal effort.
Example: Generate Python code from a flow
from ibm_watsonx_data_integration.services.datastage.codegen import PythonGenerator
code_gen = PythonGenerator()
code_gen.configuration.mode = "file_per_flow"
code_gen.generate(input_path="RowGenPeek.zip", output_path="generated_code")
What’s new in streaming (StreamSets)
Validate streaming flows
Easily validate your streaming flows with the SDK to ensure configurations are correct before deployment. (Code example skipped here since you will provide your own.)
Example: Validate existing flows
streaming_flow = project.flows.get(name = "Streaming fraud detection")
errors = streaming_flow.validate()
print(errors)
Python SDK flow generator (Streaming)
Like batch, the streaming side now supports automatic Python code generation from existing flows, accelerating development and automation.
Example: Generate Python code for a streaming flow
from ibm_watsonx_data_integration.codegen import PythonGenerator
generator = PythonGenerator(
source=flow,
destination="Desktop/streaming_flow_template.py", ##Make sure path is set correctly
auth=auth,
base_api_url="https://api.ca-tor.dai.cloud.ibm.com"
)
generator.save()
Why this matters
With the unified Python SDK, developers now have:
-
A single consistent interface for batch and streaming pipelines
-
Full programmatic control of flow creation, validation, and job execution
-
Automation capabilities through flow-to-code generation
This tech preview is the first step in unifying how you work with data integration across watsonx.data integration.
Try It Today
Join the beta today and get early access on demo videos and how to get started with watsonx.data integration. To install the package run the following command in your terminal. Documentation will be added in the coming days.
pip install ibm-watsonx-data-integration