Cloud Pak for Data

 View Only

Introducing Cloud Pak for Data v5.1

By Malcolm Singh posted Wed December 11, 2024 04:34 PM

  

Authors:

Sachin Prasad, Product Management, Program Director

Malcolm Singh, Technical Product Manager

We are excited to announce the general availability of Cloud Pak for Data version v5.1, our 16th feature release, representing a major milestone in our journey. Built on the robust foundation of Cloud Pak for Data version 5.0, Cloud Pak for Data version 5.1 introduces enhanced capabilities for deploying and managing a wide range of IBM's Data and AI containerized software products. Since IBM first envisioned a transformative Data and AI platform in 2018, Cloud Pak for Data has achieved remarkable growth, serving thousands of customers, providing over 60+ integrated services, and earning a dedicated user base that values its unified experience.

With Cloud Pak for Data v5.1 release, IBM introduces a game-changing concept that marks a significant change in how organizations deploy, manage, and scale containerized Data and AI workloads. Whether in a hybrid or on-premises environments, Cloud Pak for Data addresses it all. Tackling the ever evolving demands in the world of enterprise IT, by providing unparalleled flexibility, modularity, and multi-tenancy solutions.

Key Features and Enhancements

Cloud Pak for Data has been a cornerstone of IBM's data and AI strategy, supporting thousands of deployments globally and helping organizations streamline their data operations. However, as businesses increasingly demanded more flexible, modular, and brand-agnostic solutions, it became clear that a more adaptable platform was needed which can support multitude of personas and commercial product offerings. The platform also needed to scale up and provide better observability and control for singleton or mass deployment. Here are a few key features released as part of Cloud Pak for Data v5.1:

Control Center

Control Center is an advanced management dashboard offering a unified view across multiple Cloud Pak for Data clusters. It delivers real-time insights into platform health, resource utilization, and service deployments. With the Control Center, managing multiple clusters becomes seamless where admins can manage accounts, assign instances to accounts, define resource allocation such as CPU/memory to accounts and finally monitor over all health of various instances across accounts.

Please note that this is an optional installations in Cloud Pak for Data v5.1 but would be highly recommended to enhance administration experience for customers who have more than couple installations of Cloud Pak for Data and/or watsonx.

Administration Console

A thorough interface for monitoring and managing Cloud Pak for Data  instances. This console is geared towards administrators and shows up as a new Perspective. This perspective serves as one stop shop for all activities around monitoring such as  resource quo\gtas, user access, real-time alerts, and overall observability.

Accounts for Mutil-Tenancy

Providing true multi-tenancy, the Accounts feature although in its current form provides basic grouping of one of more Cloud Pak for Data installs providing a way to monitoring and manage Cloud Pak for Data instances as a group. However, this feature would be enhanced in future to provide true cloud like multi-tenancy with  resource and service isolation across  business units & resource metering and charge-back per account.  This would enable better governance, scalability, and cost-effective infrastructure utilization.

For more information check out the blog.

Remote Data Planes - Elastic Workload Placement for Spark Jobs

Remote Data Planes was introduced in version 5.0 augmenting the platform to extend beyond the traditional OpenShift cluster to span across multiple clusters both on-premises and in the cloud as one instance for a hybrid experience. In version 5.1, this feature now supports the platform's Analytics Engine powered by Apache Spark. Expanding on the use of this framework, Spark jobs can now run on defined remote data planes. These jobs can be dynamically scheduled at deployment time to an associated physical location based on available compute resources. This provides elastic use of the compute resources defined for the instance which can be on-premise or on the cloud in different locations.

To learn more about remote data planes check out the documentation and the Introductory blog. For more information about remote data planes and Spark check out the blog.

Nutanix and Cloud Pak for Data

IBM continues to review infrastructures to provide additional options to deploy Cloud Pak for Data. This includes hardware and storage options both on-premise and on the cloud to help reduce the total cost of ownership. For version 5.1, Cloud Pak for Data has now been certified with Nutanix Hyperconverged Infrastructure (HCI) and Nutanix storage. Nutanix HCI simplifies the deployment of hardware and storage with OpenShift to run Cloud Pak for Data leveraging a robust infrastructure for agility, reliability, and performance, while optimizing overall costs. And with Cloud Pak for Data provides an end-to-end solution for running cloud-native Data Fabric and AI workloads to drive innovation in today's competitive data and AI business landscape.

Read here for more information about Nutanix.

Git Integration for install and upgrade using ArgoCD

ArgoCD Integration technical preview introduces the ArgoCD GitOps-based installer for select services. Leveraging ArgoCD's declarative and automated application management, the CI/CD pipeline ensures efficiency, consistency, and scalability in deploying various Cloud Pak for Data  services.

Key benefits include:

  • Simplified Deployment: Automated installation and upgrade of Cloud Pak for Data platform and select services. Plan is to expand this feature in the future to cover more services.
  • Centralized Control: ArgoCD provides a unified dashboard to manage & view service deployment and real-time statuses.
  • Improved Governance: Facilitates compliance and auditing through GitOps' immutable versioning and traceability.

Check out the git repo for more information.

Backup and Restore Improvements - Phase I

In enterprise environments, recovery strategies is an important operation to ensure business continuity in the event of an unplanned outage or corruption to the environment. For Cloud Pak for Data improvements are being delivered to strengthen the backup and restore framework to provide a more reliable backup and restore process and simplify the operations. This will be released in phases with phase I in version 5.1 delivering a single command line tool orchestration using the OADP utility. The improved process includes a new single command to create an online or offline backup and similar new single command to perform restore from the backup. There are additional checks, which includes a separate backup validation check. More to improvements will be following in the monthly refreshes. 

To learn more please refer to the documentation.

Progress Indicators

Introducing Progress Indicators, which provide real-time visibility into operation status and progress for install/upgrade and shutdown/restart operations. Customers can now track the progress of ongoing operations and receive formatted messages to handle failures. This feature offers a more transparent and efficient experience for managing custom resources across multiple services.

Automatic Storage Validation

Storage performance issues have generally been ignored by administrators and install practitioners   potentially leading to system inefficiencies and bottlenecks. In Cloud Pak for Data v5.1, we are introducing Automatic Storage Validation during Install and Upgrades. Customers can now ensure their systems meet storage performance requirements right from the start. This feature integrates seamlessly into the Cloud Pak for Data pre-install process with the (cpd-cli manage) setup-instance command. By identifying and addressing potential storage issues early, users start with a sound system with reduced risks of degraded performance over time.

File Extension Validation

Cloud Pak for Data v5.1 introduces a standardized method of file upload validation, this method is introduced to improve security across all of the 50+ services that exist on top of the Cloud Pak for Data Platform. The IBM-nginx gateway now verifies uploaded files against an approved MIME-type list using the Linux file command. Non-compliant uploads are blocked and logged, ensuring consistent, platform-wide file validation with minimal impact on service teams.

Connectivity

Equally significant to the listed updates are the connectivity options available in Cloud Pak for Data 5.1. There are 100+ connectors available with support for various formats. Cloud Pak for Data 5.1 makes it easier for you to find connectors through grouping of connectors by data source type. It also allows you to create new connectors with pre-filled values from existing connectors.

The new connectors that are available on Cloud Pak for Data 5.1 are: IBM watsonx.data Milvus, Denodo, Elastic Cloud, and Azure Synapse Analytics. Along with additional enhancements to some of the existing connectors that include Proxy Support and table format support for Iceberg and Delta Lake.

Unlocking the Power of Data with IBM Cloud Pak for Data

In an era where data-driven decision-making defines competitive advantage, IBM Cloud Pak for data core delivers the tools organizations need to unlock the full potential of their data. At the core of Cloud Pak for Data lies the modular, scalable architecture and built-in elasticity that delivers a future-proof  technology stack which is highly optimized for costs & efficiency

Also, with platform’s ability to plug and play a plethora of Data and AI services (60+)  across Cloud Pak for Data, Data Fabric and watsonx offerings, its imperative that investing in such platform not only gives access to most of the IBM Data and AI innovations but also allows for a predictable experiences across  IT, business, and operations. 

Cloud Pak for Data with support for a huge selection of on-prem and cloud storages and form factors,  empowers organizations to start small where their business demands are and then scale efficiently while maintaining optimal performance. Recently launched Remote Data Plane with its bust to cloud feature lets one start on-prem and scale/spread workloads across cloud ON DEMAND.

Summary

Cloud Pak for Data v5.1 is a true display of IBM's commitment to empowering organizations, not only by the existing methods of the platform, but with cutting-edge tools for data and AI innovation. By introducing transformative features like  Control Center, Accounts, and GitOps-based deployment, the release lays down a solid foundation for enterprise scalability, security, and efficiency. Whether you're leveraging  multi-tenancy capabilities, optimizing resource management, or just future-proofing your IT infrastructure, Cloud Pak for Data v5.1 provides a cohesive platform to address the complexities of modern data-driven enterprises. We’re excited to see how our customers will benefit from these innovations to unlock new possibilities and drive meaningful outcomes.

To learn more about the new features and functionality check out the What's New section in the documentation.


#Spotlight
#data-ai-highlights-home
#data-highlights-home
#ai-highlights-home
#Highlights-home
#Featured-area-2
#Featured-area-2-home

0 comments
38 views

Permalink