Global Data Lifecycle - Integration and Governance

Global Data Lifecycle - Integration and Governance

Connect with Db2, Informix, Netezza, open source, and other data experts to gain value from your data, share insights, and solve problems.

 View Only

AI-Powered DataStage is here: Build ETL pipelines with Natural Language

By Shreya Sisodia posted 2 days ago

  

TL;DR: AI tools and technologies are revolutionizing every industry across users of all technical skill levels. With the GA of the data integration Assistant - now available on DataStage SaaS - users new or experienced can leverage a GenAI-powered assistant to integrate data easier than ever, with higher confidence and trust. Build ETL pipelines using natural language, create executable functions from business intent, and intelligently search documentation all via the Assistant today. Keep reading to learn more…

AI-powered technologies are used daily by hundreds of millions of people across the globe, with tools like ChatGPT, Claude, Gemini, Cursor, and more quickly becoming household names. In the face of final papers, building out entire apps through natural language, you name it, no task seems as daunting when you have a launching pad ready to act as your own personal assistant. It’s so clear to see the impact AI has had across almost every domain and industry; now, we can immediately tap into an entire knowledge base and boost productivity.

Despite strides in AI, some areas remain relatively untapped. One of these underserved areas is the world of data integration. Today, since the majority of data-related work flows through data engineering, data engineering teams are overwhelmed. Wrangling data between sources and targets, trying to onboard to a new data management tool, connecting to different applications with different authentication methods, all while trying to keep up with the latest innovations, create large barriers to productivity. For LOBs and non-technical business users who don’t have the expertise in managing code-heavy integration tooling, authoring and maintaining pipelines is even more impractical. 

IBM’s data integration portfolio has been delivering solutions for most of these problems for the past 20 years. We’ve consistently been a trusted leader in the data integration space - bringing users the latest innovations across ETL/ELT/TETL, partnering with the largest enterprises to modernize them to hybrid cloud architectures, and even helping process unstructured data for downstream AI processing. Our mission has always been to help our users integrate and transform data from anywhere, to anywhere, as easily as possible. Today, we make this mission even easier to accomplish. We’re so excited to bridge this last gap and bring the latest AI innovations into the data integration space with the launch of the data integration Assistant. Now available on DataStage, the Assistant empowers users of all skill levels to reduce onboarding time, efficiently navigate the UI, and easily build robust data pipelines. Keep reading to see all that the Assistant can do for you today and learn about what we have coming down the road with watsonx.data integration!

Main Features of the data integration Assistant: 

Build robust data pipelines entirely through natural language

Users can leverage the Assistant today to build multi-stage data pipelines all through natural language. Select Build a flow to get started and input your pipeline intent in a few sentences. Built using watsonx Assistant, the Assistant will then leverage the underlying Granite 3.1 model to predict the most relevant connectors, transformation stages, and properties, and then populate them onto the design canvas. The Assistant automatically outputs a pipeline skeleton for the user, so that they don’t have to spend time exploring the hundreds of different connectors and stages available to them, and instead accelerate time for authoring. Whether it’s connecting to data sources, using operations such as joins, removing duplicates, or loading into targets such as watsonx.data, 90% of the available out-of-the-box stages are supported by the Assistant today with the remaining stages in progress.

Users can modify pipelines at any point in their flow and even employ the Assistant to automatically generate relevant stage names. Now, you don’t have to manually rename each stage based on the transformations occurring; instead, click the AI button, sit back, and let the Assistant do the self-documenting for you. The data integration Assistant empowers anyone, not just a data engineer with 10 years of experience, to intelligently integrate applications and data using word.

Transform business intent into executable functions

Our team built the Assistant for both new users trying to learn the product for the first time and existing users already familiar with the canvas and its stages. The Transformer stage is an extremely powerful, and loved, stage that spans hundreds of different built-in functions (from financial, to mathematical, logical, date time, and more) as well as custom transformations that can be applied to your data. Although powerful, this stage can be very intimidating when trying to harness all of its capabilities, especially as a non-technical user. With the Assistant, users can now interact with the Transformer stage, easier than ever.

By selecting Create Transformer expression, users can detail their business intent in natural language and then have the Assistant generate an appropriate expression. Syntax validation is additionally integrated directly within the Transformer stage to ensure that expressions, generated by the Assistant or users, adhere to proper syntax requirements. With this capability, users can immediately get a functioning Transformer expression simply by describing their desired end result.  

For users working with existing ETL flows, especially those with complex Transformer stages, it can be extremely time-consuming trying to decipher what a coworker built and what exactly their Transformer expressions are doing to your data. With Explain Transformer expression, the Assistant removes this guesswork, instantly translating existing Transformer expressions into digestible explanations.

Intelligently search documentation

The Assistant intelligently searches documentation to help answer questions on usage and setup; users can even receive clear examples for how to leverage certain stages and connectors, including the command line (cpdctl). Now, instead of sifting through pages of documentation or switching between countless tabs, select Ask a DataStage question, give your Ctrl-F key a rest, and have the Assistant do the searching for you exactly when you need it.

***

Just like ChatGPT and Claude aim to make your everyday easier, the data integration Assistant brings this same power to your data integration tasks. Users can access the Assistant today on DataStage SaaS and immediately start building ETL pipelines faster, with greater confidence. Data pipelines spanning disparate sources and targets are no longer daunting to design with native natural language support, complex stages like the Transformer stage are more accessible than ever with self-documentation and automatic expression generation, and for all remaining questions, the Assistant can intelligently parse through product documentation. 


The innovations don’t stop there - the IBM data integration team is constantly delivering new features and look to support capabilities like AI-generated flow annotations, text-to-SQL conversion, automatic issue remediation, and agentic tools for the data engineer later in the year. All of this, and more, will be available in our new unified platform, watsonx.data integration, so users can seamlessly mix and match between different data integration patterns with AI at your side for every step. Get started with a free DataStage trial today and discover what the Assistant can do for you. Happy exploring! 

0 comments
45 views

Permalink