The DataStage-aaS Anywhere architecture is split into two core components: design-time and runtime. The design-time portion, also referred to as the control plane, is where users interact with the DataStage application and the rest of IBM Cloud Pak for Data as a Service (IBM’s platform solution for all data and analytics tools). Within the control plane, users can build their DataStage flows using the low-code/no-code drag and drop canvas and pull from 100+ pre-built connectors and transformations. Users can also create projects, set up deployment spaces, import DataStage assets, and access administrative tools all from the control plane. Once the DataStage flow is built and ready for execution, it is then run on the data plane, the DataStage runtime. The data plane hosts the market-leading, highly scalable parallel engine that executes all DataStage jobs. In this scenario, the control plane is where users build and design their ETL/ELT pipelines, and the data plane is where those completed ETL/ELT pipelines are executed. To frame it even more simply, the control plane is the “mind”, where all decisions are made and actions are planned, and the data plane is the “body”, carrying out movements and instructions that the mind has communicated.
At its core, the DataStage architecture is completely microservices-based, which has enabled creating a separation between the control plane and data plane. With DataStage-aaS Anywhere, we leverage this separation and enable users to take the data plane and install it as a remote engine within their own VPC. Therefore, the control plane remains on IBM Cloud for users to log in and build their DataStage flows, but the data plane can now be relocated to run in a location of the user’s choice. The remote engine manifests as a container that can be run on any container management platform or natively on any cloud container services. In offering this remote data plane deployment, users can now optimize execution of their data pipelines, keep sensitive data behind firewalls, and ensure seamless access to their hybrid cloud and AI workloads anywhere, anytime.
DataStage-aaS Anywhere User Journey
Now let’s follow a sample journey of a user spinning up a remote engine to execute their DataStage data pipelines on their local machine for their on-premises data.
First, the developer logs in to IBM Cloud and selects their project. An administrator has already spun up their remote engine using our simple startup script and tied this project to their DataStage remote runtime environment, so the developer can now begin building their DataStage flow. After opening up the DataStage canvas, the user can leverage 100+ pre-built connectors and stages to quickly design their flow. Because a remote environment execution has already been configured for this project, the user can additionally access extended functionality, such as utilizing custom code components (enabled through stages such as External Source/Target, Build/Wrapped/Custom Stages, function libraries, Java Integration Stage).