TL;DR: IBM DataStage is now available as a Service (aaS) on AWS, enabling clients to leverage next-generation DataStage capabilities while maximizing their existing AWS investments and the advantages of a cloud-native deployment. This launch marks a key milestone for clients on their modern data integration journey—laying the groundwork for IBM’s newest offering, watsonx.data integration. Together, the availability of DataStage aaS on AWS and the GA of watsonx.data integration reflect IBM’s continued investment in advancing data integration and meeting the evolving needs of today’s enterprises.
As businesses race to scale generative AI, many encounter the same roadblock: their data isn’t ready. AI demands accurate, unified, and trusted data—but that’s hard to deliver when it’s fragmented across clouds, applications, and systems. This fragmentation is exacerbated by the explosion of data volume in recent years, with data rapidly propagating on-premises and across clouds, applications, and locations with compromised quality. Without a scalable way to move, cleanse, and transform that data, organizations can’t break down silos or prepare it for AI use.
There is a large shift accelerating towards cloud-based infrastructure and services, fueled by benefits like elastic scalability, faster time to market and lower infrastructure cost. According to a 2024 Gartner report, global end-user spending on public cloud services was forecasted to total $679 billion and projected to exceed $1 trillion in 2027. To address the complications of siloed data and fragmented data across an enterprise ecosystem and cloud, DataStage offers a multicloud deployment offering dedicated to ensure data transparency and quality across hybrid cloud contexts.
These trends underscore the growing need for modern data integration solutions that can be deployed in the cloud.
IBM’s approach:
DataStage, a core component of IBM’s Data Integration portfolio, is a next-generation ETL/ELT/TETL solution—modernized for today’s enterprise needs with enhanced scalability, performance, and deployment flexibility across cloud and on-prem environments. Next Generation DataStage provides clients the most performant solution on the market to connect, transform, and deliver trusted data across multicloud and hybrid cloud environments for downstream use cases. Users can load data using out-of-the-box native connectors and build pipelines with flexibility—whether through an intuitive low-code/no-code designer, an AI-powered flow assistant for natural language-generated pipelines, or a Python SDK for a code-first data integration experience. All pipelines run on DataStage’s parallel engine, delivering scalability and best-in-class performance.
DataStage as a Service on AWS:
Today, we carry the tradition forward with continued innovation and a deep commitment to our clients by launching DataStage as a Service on AWS. Our focused efforts for our release enables enterprises to maximize their AWS cloud investments while leveraging the power of a modernized DataStage experience. With batch integration provisioned on AWS, customers can expect improvements to their integration pipelines like lower latency and improved response times when data is processed closer to local data residency solutions, cost advantages by eliminating unnecessary data transfers, and compliance adherence when paired with data residency and regional regulations.
This launch represents a key milestone as clients can take advantage of DataStage aaS on AWS, available as a native, fully managed service (serverless, with your service up and running in seconds) or “Anywhere” with a control plane on AWS and a self-managed, containerized, and scalable data plane that can reside behind a client’s firewall or in their VPC (virtual private cloud). Users of DataStage aaS on AWS also gain the benefits and strengths of existing DataStage capabilities such as high-code authoring through Python SDK, automatic parallel partitioning, hundreds of prebuilt connectors and stages, and SQL pushdown capabilities.
Benefits of DataStage aaS on AWS:
-
Modernized integration, built for the cloud era: Harness the power of the latest generation of IBM DataStage—now delivered as a fully managed, cloud-native service on AWS. It combines the trusted performance of DataStage with the elasticity, simplicity, and scalability of serverless cloud computing.
-
Fully managed experience for scalability: Eliminate the burden of infrastructure management to get scaled quickly. Serverless DataStage on AWS delivers a fully managed service providing automatic maintenance, updates, and installation. These application efficiencies empower teams so they can focus on building pipelines, not managing platform architecture.
-
Native AWS integration: Easily connect to your existing AWS ecosystem and capitalize on those investments—whether it's S3, Redshift, RDS, or more. This enables you to consolidate data integration, storage, and compute within a single cloud environment for a simplified architecture.
Benefits of DataStage aaS Anywhere on AWS:
-
Execute pipelines wherever data resides: Gain the flexibility to design pipelines in a fully managed AWS environment while executing them either within AWS or outside of it—across any geography, data center, or on-premises environment. This hybrid model is made possible through DataStage’s containerized remote engines, which decouples design-time from runtime so that the control plane remains in AWS, while the data plane—where integration jobs are executed—can be collocated to the data. As a result of reduced data movement, organization can minimize latency, ingress/egress costs, and security risks. This is the best of both worlds: fully managed cloud-native development with execution flexibility tailored to your data landscape.
Learn more about the power of the remote engine.
DataStage today, watsonx.data integration tomorrow:
ETL/ELT/TETL solutions have remained critical for building high-quality data integration pipelines for decades at enterprise scale, but as data types, volumes, and user demands accelerate, organizations are running into limitations common across the traditional integration tooling market.
To address these challenges, the IBM Data Integration team has launched watsonx.data integration—designed to tackle the core issues data teams face today, including unstructured data integration, frequent pipeline rework, tool sprawl, and a widening skills gap. It provides a unified control plane for building reusable pipelines across all integration styles—batch, real-time streaming, replication—and all data types, underpinned with built-in data observability. With no-, low-, and pro-code authoring options, users of all skill levels can build pipelines, eliminating the need for multiple specialized tools and enabling teams to choose the best-fit integration style for each use case.
watsonx.data integration brings together the powerful capabilities of IBM’s industry-leading integration portfolio—including DataStage, StreamSets, Databand, and Data Replication—along with new innovations like unstructured data pipeline support, into one, unified offering. With the best and prized capabilities of DataStage and the unlocked capabilities for observability, real-time streaming, and unstructured data integration for AI use cases, watsonx.data integration offers a clear path for DataStage users to advance in their data integration modernization journey while retaining the experience and processing power of existing functionality.
Users can get ready for a future with watsonx.data integration by getting started today with batch/ETL/ELT capabilities delivered by DataStage as a Service (aaS) on AWS. With this launch, organizations can take advantage of the foundation for watsonx.data integration on AWS through DataStage and incorporate new integration capabilities as they roll out. Now, clients can capitalize on modernized batch processing while maximizing their existing AWS investments.
DataStage as a Service on AWS brings together the best of both worlds: a starting point to build a modern, unified integration experience and a flexible deployment option supporting multi and hybrid cloud requirements that the market demands. With this onboarding, IBM takes another step and investment to meet clients where they are while creating the foundation to modernize their data portfolio for the future of the AI enterprise.
Review the public announcement on the IBM documentation website for more information about product terms and conditions.