Deliver trusted data with Cloud Pak for Data v4.7, now available
Cloud Pak for Data is generally available, delivering a new range of enhancements and features:
-
Reduced footprint: new scale config: small_mincpureq allowing for close to zero CPU reservations to allow idle pods to release CPUs
-
Single node OpenShift for Cloud Pak for Data Express: full cluster deployment in a single node
-
Disruption free Upgrades: Keep your environments online during upgrades.
-
Parallel Installs & Upgrades: speed up installations by 20-50%
-
New connectors: query lakehouse data with Presto and create an accurate and up-to-date repository of product and service information with IBM Product Master
This is our first bi-annual feature release in 2023. Continuous enhancements like these make Cloud Pak for Data a best-in-class enterprise-grade infrastructure, enabling organizations to establish a data foundation to support a broad range of workflows.
Recently, IBM released watsonx, an AI platform built to help organizations scale generative AI. Generally available in July, the integration between Cloud Pak for Data and watsonx helps clients overcome the two primary barriers for successful AI implementations: data accessibility and data volume and complexity. In addition to delivering trusted data for AI workflows, Cloud Pak for Data enables clients to break down data silos, keep data private and secure, and empower data consumers of any skill level.
Originally launched in 2018, IBM Cloud Pak for Data enables enterprises to establish a foundation to streamline data management, foster a data-driven organization, and drive AI innovation. Over the last year, Cloud Pak for Data has been leveraged by clients to establish an architecture that delivers data integration, data governance, and data observability capabilities. Our award-winning data fabric capabilities are also critical in delivering trusted data, an integral component of successful AI workflows.
Our effort to continuously deliver excellence, making strides in resource optimization, enterprise readiness, and robust performance ensures that Cloud Pak for Data is well positioned to support all data and AI related facets of an enterprise such as accelerating AI implementations, supporting self-service data consumption, flexible data movement, monitoring data, and connecting to and storing all types of governed, quality data.
Here is what to expect from Cloud Pak for Data v4.7:
Cloud Pak for Data Express
IBM’s new Cloud Pak for Data Express offerings let you kickstart your AI and Data Fabric journey by addressing specific needs quickly. With Cloud Pak for Data Express, you can start small and grow at your own pace. We now offer 3 pre-built, pre-sized offerings Data Governance Express, Data Science & MLOps Express, and ELT Pushdown Express, each designed to address a current data fabric need. Express offerings are now available on AWS & Azure Marketplace! Set up deployment easily and finish within four hours.
In order to support our Express offerings, Cloud Pak for Data has taken significant steps to reduce overall resource requirements in v4.7, focusing on vCPU & Memory. By addressing these areas, Cloud Pak for Data aims to provide customers with a more cost-effective and competitive solution, both on-premises and in the cloud.
Deploy a full cluster on just a single node! OpenShift (v4.9 and newer) now supports a full cluster deployment in a single node. This fully supported topology joins the three node cluster and remote worker topologies to offer three options to meet more customer requirements in more edge environments. Single node OpenShift offers both control and worker node capabilities in a single server, which allows for the smallest overhead possible for running workloads, cost-effective development and testing. This feature is currently only offered with Cloud Pak for Data Express offerings. This whitepaper explains the steps to deploy a full OpenShift single node cluster on AWS, offering both control and worker node capabilities in a single server. Learn more about Single Node OpenShift here.
Deployment Updates (Installs & Upgrades)
Disruption Free Upgrades: Keep your environments online during upgrades. Cloud Pak for Data now remains functional throughout the upgrade process, with zero/minimal interruptions. This feature will be available starting with upgrades from v4.7 to future monthly upgrades.
Parallel Installs & Upgrades: Install/upgrade multiple Cloud Pak for Data components simultaneously for faster deployments and upgrades. Installations are faster by 20-50%.
With private topology, customers are now able to have simplified deployments and management for multiple Cloud Pak for Data instances on the same OpenShift cluster. Shared cluster components are installed once and shared by all tenants, reducing the need to have multiple installations for each tenant. Customers can now deploy different releases of Cloud Pak for Data on the same cluster to maximize cluster resources. This replaces the express and specialized installation topology (upgrades to 4.7 will be migrated to private topology). Learn more about architecture-supported private namespace configurations here.
OpenShift 4.12 Support: With the release of Cloud Pak 4.7, we now support OpenShift 4.12. Red Hat OpenShift Container Platform provides developers and IT organizations with a hybrid cloud application platform for deploying both new and existing applications on secure, scalable resources with minimal configuration and management overhead.
Scheduler driven cluster node balancing: The Cloud Pak for Data Scheduler intelligently considers the capacity of each worker node during pod deployment, as opposed to the traditional kube scheduler that only selects another worker node if the current node lacks capacity. Read this blog to learn more about improving cluster balancing with the Cloud Pak for Data scheduler.
Security and Compliance Updates
FISMA: At IBM, we are dedicated to achieving compliance with the Federal Information Security Management Act (FISMA) and for this we have identified three key requirements, namely CIS, FIPs, and Auditing. We now have 92% compliance on FISMA tables takes for base AI services. Refer to the documentation for more information regarding which services support FISMA requirements.
STIG: To comply with STIG Requirements and improve our enterprise readiness, we have introduced the capability to limit the number of login sessions per user. Admin user sessions are also logged off after a 10-minute idle time period is exceeded.
Improved Accessibility: At IBM, we prioritize accessibility for all users. Our commitment to compliance with Section 508 of the Rehabilitation Act of 1973 ensures equal access to information and technology. We now have 8 services reporting full accessibility compliance with and partially compliant services taking strides to reach full compliance.
With Cloud Pak for Data v4.7, experience seamless scalability and cost optimization with automatic scaling and flexible service control. 22 services now support automatic scaling using the standard HPA engine and 36 services support service restart through SSR, where the user can choose to shut down services when they are not needed and restart on demand. Refer to the HPA & SSR Documentation for the full list of supported services.
Cloud Pak for Data v4.7 now supports AWS FSx for NetApp - a fully managed service that provides a high-performance file storage solution that is optimized for use with workloads that require shared access to data. AWS FSx allows for three times more throughput than EFS storage, and saves costs by 46%.
Accessing data sources is an important component of Cloud Pak for Data, which requires providing connectors and enhancements throughout the delivery cycle. Cloud Pak for Data supports over 80 connectors and various formats, with the addition of using Generic JDBC. With this vast array of supported connectors, the focus has been providing enhancements for existing connectors, such as additional support for authentication. However, two new connectors where introduced: Presto and IBM Product Master. Presto is a fast and reliable SQL engine for Data Analytics and the Open Lakehouse. Product Master is a trusted product information management system (PIM) with collaborative master data management (MDM) capabilities.
As always Cloud Pak for Data can be deployed anywhere on a Red Hat OpenShift Container Platform cluster. The cluster can be on any cloud or behind a firewall for on-prem deployments, just as long as the cluster meets the prerequisites defined in the documentation.
For managed hyperscaler OpenShift clusters, IBM continues to refresh the Marketplace and Catalog Services offerings on IBM Cloud, AWS, and Azure. Currently Cloud Pak for Data 4.6.x is available, including Cloud Pak for Data Express Parts and Data Fabric Offerings. These will all be refreshed after Cloud Pak for Data v4.7 is released in early 3Q23 and will also include a revamp of the Marketplace and Catalog Services with consolidated tiles and stream-lined deployments for easier navigation and deployments.
Cloud Pak for Data encompasses the capabilities of 40+ IBM and partner services that has continually expanded. Our services have delivered a range of new features that are available in Cloud Pak for Data v4.7. For a deeper dive check out What’s New in the Cloud Pak for Data documentation, or check out these blogs: