Welcome to the blog post series on data streaming with watsonx.data integration. In this blog post series we will introduce data streaming on watsonx.data integration and walk through how to customize the streaming capability for specific business logic or unique integration needs.
In this blog post we are going to walk through the steps to set up the prerequisites for data streaming on watsonx.data integration.
Overview
Let us start with a quick note on watsonx.data integration and the streaming capability that is available on the platform. The watsonx.data integration platform provides a very intuitive and low code data integration solution to process data in streaming or data mode seamlessly. End users can build data pipelines quickly while still defining how the data flows from the source to target systems. StreamSets is an IBM product that brings the streaming capabilities to the watsonx.data integration suite. IBM StreamSets enables users to create and manage smart streaming data pipelines through an intuitive graphical interface, facilitating seamless data integration across hybrid and multi-cloud environments
Setting up streaming on watsonx.data integration suite
Prerequisites
- A valid watsonx.data integration instance - Create your free instance here
- A valid API key for watsonx.data integration
- A container management platform such as Docker or Podman
Set up
If you are new to watsonx.data integration, before you can work with streaming data use cases you need to create a project.
Creating a Project
Once you have access to watsonx.data integration you should be able to create a project by selecting the "Projects" option in the navigation bar as shown below

You will be able to create a project from the Create a Project page as below

Once the project is ready, you can manage the project from the project landing page as below

The next step is to select the nine selector option at the top right hand corner of the page and select Data Fabric from the dropdown

To access the tools available, you needs to select the Manage tab under the Project

Creating a StreamSets environment
A StreamSets environment is the first step to create a streaming flow, to do so you need to select StreamSets under Tools on the left pane.

You can manage StreamSets from the landing page as below. Select the New environment option to proceed.

You will be prompted to provide the details for the environment on the New Environment page.
Key attributes to note on the new environment window are
- Data Collector engine version, stage libraries and external resources are three key attributes that need to focus to deploy custom stages and run some flows.
- Data Collector engine version selection defines the features that are available to the user.

Stage libraries can be selected to meet your needs and the installed stage libraries determine the sources, processors, targets, and executors you can use in flows.

External resources enables you to deploy an asset to the StreamSets Data Collector engine and we will be deploying custom stages using this feature.

Once you have made the required changes the new environment can be saved to generate the engine run commands. This action generates the Engine run command as shown in the below screen

In order to execute the command to create the engine you need to have container management platform such as Docker or Podman on the host where the command is executed. For detailed instructions, see Prerequisites. Copy the generated command, and then run the command using the detailed steps in Running the engine command.