Cloud Pak for Data

 View Only

Remote data planes for Cloud Pak for Data v5.0: Orchestrate Workload Placement

By Malcolm Singh posted 10 days ago

  

Authors:
Product Management: Malcolm Singh 
Development Engineering: Jun Feng Liu, Ramiya Venkatachalam, Ryan Pham, Ritesh Gupta 
Platform Engineering: Lakshmi Palaniappan

Introduction

In today's business world, companies have offices and teams that span the world. While the internet and the cloud brings these teams virtually closer together, their Data & AI platforms are located in select areas that are managed separately. If a workload created in one location needs to run in a different location, additional management is required to move the workload. Moving a workload to a different location is desirable to efficiently process and analyze data when needed, either closer to the data or in higher capacity infrastructure.

Companies are seeking to build a Data & AI platform that also spans globally. A platform that can be managed as one compute infrastructure both on-premises and on the cloud. This is now possible with the Cloud Pak for Data platform using Remote Data Planes. This new framework is now available in Cloud Pak for Data version 5.0. Remote Data Planes augments the Cloud Pak for Data platform to extend beyond the traditional OpenShift cluster to span across multiple clusters both on-premises and in the cloud as one instance for a hybrid experience. This new framework removes the restriction of running workloads in just one location, where users can choose to run the workloads closer to the data source in a different location all under one instance. Users can now create a workload definition on the Cloud Pak for Data control plane and deploy them to a remote data plane. This innovation brings the processing capabilities to the data for data gravity, and at the same time meets data regulations and compliance for data sovereignty, which reduces the need to transfer data.

This innovation brings several values for the Cloud Pak for Data platform:

  • One instance: consolidates multiple Cloud Pak for Data instances into one instance
  • Data gravity: moves processing capabilities closer to the data source
  • Data sovereignty: process data where it is collected and protected under national laws
  • Organize compute resource usage: logically organize resources based on data processing requirements, line of business, or special devices
  • Optimize resource management: run on-premises and scale out to the cloud

Use case scenarios:

  1. Users now can run workloads in different geographical locations without needing to fully operate all Cloud Pak for Data software as a separate instance in each location.  The remote data plane concept greatly reduces the hardware requirements for Cloud Pak for Data since users can now consolidate multiple Cloud Pak for Data instances into one instance. The Cloud Pak for Data remote agent running in these different locations is light-weight and lean.
  2. Remote data planes brings the processing capabilities to the data for data gravity, and at the same time meets data regulations and compliance for data sovereignty, this reduces the need to transfer data. Users can now choose the appropriate location to send the workload remotely, instead of moving the data.
  3. Administrators can organize the resources based on a certain logical groups, such as line of business, or departments. The organization can also group the cluster that have the similar hardware properties into same the data plane, for example, a GPU cluster for GenAI
  4. Remote data plane allows users to distribute workloads based on priority. The Cloud Pak for Data administrator can assign a priority for every cluster in the data plane, any workload going into the data plane fill into the high priority cluster then the lower priority cluster, this capability allows users to unlock the cloud bursting scenario. Cloud Pak for Data administrators can allow the workloads fill in the on premise cluster first then go to the cloud cluster.

Solution Overview

The Cloud Pak for Data platform remote data plane framework is comprised of three key items and concepts:

  • Physical Locations
  • Data Planes
  • Cloud Pak for Data Control Plane

Physical Locations are OpenShift cluster namespaces running a light-weight remote agent. A single remote OpenShift cluster can host multiple physical locations.

Data Plane is a logical organization of physical locations into groups. A physical location can be assigned to more than one data plane, and multiple physical locations can be assigned to one data plane.

Cloud Pak for Data Control Plane manages and monitors the data planes and physical locations in one central location as the ‘hub’ for one instance.

The following diagram illustrates these concepts with an example of physical locations in different geographical regions, with four physical locations used by three data planes. The primary OpenShift cluster with Cloud Pak for Data control plane (hub) is in the us-west region and the remote OpenShift clusters which have multiple remote physical locations are in the europe-central and asia-east regions. The physical locations are logically grouped and organized under 3 data planes. Data plane 1 and data plane 2 contain physical locations spanning multiple clusters and Data plane 3 contains physical locations belonging to the same cluster. Each remote OpenShift cluster here has 2 physical locations.

A Closer Look at Physical Locations and Data Planes

The setup and configuration first starts with creating the physical locations on the remote OpenShift cluster and then defining the remote data plane using the physical location.

Physical Location

The setup of the physical locations is handled using command line tools that creates and registers the physical location. The first step is to install the Cloud Pak for Data management agent at the physical location in a namespace. This is lightweight agent that uses a smaller footprint than a complete Cloud Pak for Data instance. The second step is to ra egister the physical location to establish mutual trust between physical location and the Cloud Pak for Data control plane. For more details and instructions please refer Setting up the remote physical location for IBM Cloud Pak for Data in the documentation.

Data Plane

After the remote physical locations are setup, the Cloud Pak for Data platform administrator can proceed to create the data planes using the Cloud Pak for Data control plane console. One or more remote physical locations can be assigned to a data plane. In the Configurations and Settings page the administrator can view the physical locations registered and manage data planes in the respective tabs.

Data planes tab: Lists all the data planes created for the Cloud Pak for Data control plane. For example, cpdedge2 data plane has 1 physical location assigned to it and will be used fo medium size workloads. The user can click on New data plane button to create a new data plane and assign physical locations to it.

DataStage Service using Remote Data Planes

The DataStage service will be the first service to use remote data planes in Cloud Pak for Data v5.0. When the administrator provisions a new DataStage service instance, they can choose to use a data plane defined in the Cloud Pak for Data instance. The DataStage user can then decide where to run the PX Runtime based on the DataStage service instance. This brings the DataStage job closer to the data source at a remote location.

Service Instances page: Lists the service instances that the user has access to including remote service instances. For example, cpdedge-inst2 is a remote service

Conclusion

Remote Data Planes augments the Cloud Pak for Data platform to extend beyond the traditional one OpenShift cluster to span across multiple clusters both on-premises and in the cloud as one instance for a hybrid experience. This innovation brings the workload closer to the data source for data gravity and data sovereignty. The DataStage service is the first adopter of this framework in Cloud Pak for Data v5.0, with more services to onboard in future releases.

Learn about IBM Cloud Pak for Data v5.0 and Remote Data Planes

IBM Cloud Pak for Data Product Page

IBM Cloud Pak for Data Overview

Remote Data Planes

Cloud Pak for Data v5.0 Platform - What’s New blog

IBM unveils Cloud Pak for Data 5.0


#CloudPakforData
0 comments
32 views

Permalink