Cloud Pak for Data

 View Only

Deliver trusted data with Cloud Pak for Data v4.7, now available!

By SACHIN PRASAD posted Wed June 28, 2023 06:01 PM


Deliver trusted data with Cloud Pak for Data v4.7, now available 


To meet our clients’ needs for a trusted data foundation and high quality data, we are thrilled to announce that version 4.7 of IBM’s Cloud Pak for Data is generally available, delivering a new range of enhancements and features:  

  • Build on the cloud of your choice: IBM Cloud Pak for Data Express offerings are now available on AWS & Azure Marketplace 

  • Reduced footprint: new scale config: small_mincpureq allowing for close to zero CPU reservations to allow idle pods to release CPUs 

  • Single node OpenShift for Cloud Pak for Data Express: full cluster deployment in a single node 

  • Disruption free Upgrades: Keep your environments online during upgrades. 

  • Parallel Installs & Upgrades: speed up installations by 20-50% 

  • New connectors: query lakehouse data with Presto and create an accurate and up-to-date repository of product and service information with IBM Product Master 

  • Security and compliance updates and more 

This is our first bi-annual feature release in 2023. Continuous enhancements like these make Cloud Pak for Data a best-in-class enterprise-grade infrastructure, enabling organizations to establish a data foundation to support a broad range of workflows.  

Recently, IBM released watsonx, an AI platform built to help organizations scale generative AI. Generally available in July, the integration between Cloud Pak for Data and watsonx helps clients overcome the two primary barriers for successful AI implementations: data accessibility and data volume and complexity. In addition to delivering trusted data for AI workflows, Cloud Pak for Data enables clients to break down data silos, keep data private and secure, and empower data consumers of any skill level. 

Originally launched in 2018, IBM Cloud Pak for Data enables enterprises to establish a foundation to streamline data management, foster a data-driven organization, and drive AI innovation. Over the last year, Cloud Pak for Data has been leveraged by clients to establish an architecture that delivers data integration, data governance, and data observability capabilities. Our award-winning data fabric capabilities are also critical in delivering trusted data, an integral component of successful AI workflows.  

Our effort to continuously deliver excellence, making strides in resource optimization, enterprise readiness, and robust performance ensures that Cloud Pak for Data is well positioned to support all data and AI related facets of an enterprise such as accelerating AI implementations, supporting self-service data consumption, flexible data movement, monitoring data, and connecting to and storing all types of governed, quality data.  

Here is what to expect from Cloud Pak for Data v4.7: 


Cloud Pak for Data Express 

IBM’s new Cloud Pak for Data Express offerings let you kickstart your AI and Data Fabric journey by addressing specific needs quickly. With Cloud Pak for Data Express, you can start small and grow at your own pace. We now offer 3 pre-built, pre-sized offerings Data Governance Express, Data Science & MLOps Express, and ELT Pushdown Express, each designed to address a current data fabric need. Express offerings are now available on AWS & Azure Marketplace! Set up deployment easily and finish within four hours. 

By choosing Cloud Pak for Data Express, you will be the first to benefit from extensive innovations aimed at reducing the total cost of ownership. These innovations are designed to cater to customers who want to focus on specific data and AI use cases. In the 4.7 release, Express customers will experience a reduced number of nodes and a smaller footprint, making the solution more efficient and cost-effective. Additionally, we now offer support for Single Node OpenShift, enabling even greater flexibility. Furthermore, AWS is offering additional discounts to jumpstart these initiatives, providing even more value to our customers. 

Take advantage of the Cloud Pak for Data Express offerings to accelerate your AI and Data Fabric journey. Start with a focused approach, address your specific needs, and benefit from the latest innovations and cost reductions.  


Reduced Footprint 

In order to support our Express offerings, Cloud Pak for Data has taken significant steps to reduce overall resource requirements in v4.7, focusing on vCPU & Memory. By addressing these areas, Cloud Pak for Data aims to provide customers with a more cost-effective and competitive solution, both on-premises and in the cloud. 

This release introduces a new scale config: small_mincpureq, enabling near zero CPU reservations to allow idle pods to release CPUs that would make them available for other workloads. This represents a significant step forward in maximizing resource utilization within Cloud Pak for Data, and helping customers reduce their investments.  


Single Node OpenShift 

Deploy a full cluster on just a single node! OpenShift (v4.9 and newer) now supports a full cluster deployment in a single node. This fully supported topology joins the three node cluster and remote worker topologies to offer three options to meet more customer requirements in more edge environments. Single node OpenShift offers both control and worker node capabilities in a single server, which allows for the smallest overhead possible for running workloads, cost-effective development and testing. This feature is currently only offered with Cloud Pak for Data Express offerings. This whitepaper explains the steps to deploy a full OpenShift single node cluster on AWS, offering both control and worker node capabilities in a single server. Learn more about Single Node OpenShift here. 


Deployment Updates (Installs & Upgrades)  

Disruption Free Upgrades: Keep your environments online during upgrades. Cloud Pak for Data now remains functional throughout the upgrade process, with zero/minimal interruptions. This feature will be available starting with upgrades from v4.7 to future monthly upgrades. 

Parallel Installs & Upgrades: Install/upgrade multiple Cloud Pak for Data components simultaneously for faster deployments and upgrades. Installations are faster by 20-50%. 



Private topology  

With private topology, customers are now able to have simplified deployments and management for multiple Cloud Pak for Data instances on the same OpenShift cluster. Shared cluster components are installed once and shared by all tenants, reducing the need to have multiple installations for each tenant. Customers can now deploy different releases of Cloud Pak for Data on the same cluster to maximize cluster resources. This replaces the express and specialized installation topology (upgrades to 4.7 will be migrated to private topology). Learn more about architecture-supported private namespace configurations here. 


Operational Updates  

OpenShift 4.12 Support: With the release of Cloud Pak 4.7, we now support OpenShift 4.12. Red Hat OpenShift Container Platform provides developers and IT organizations with a hybrid cloud application platform for deploying both new and existing applications on secure, scalable resources with minimal configuration and management overhead. 

Scheduler driven cluster node balancing: The Cloud Pak for Data Scheduler intelligently considers the capacity of each worker node during pod deployment, as opposed to the traditional kube scheduler that only selects another worker node if the current node lacks capacity. Read this blog to learn more about improving cluster balancing with the Cloud Pak for Data scheduler. 


Security and Compliance Updates  

FISMA: At IBM, we are dedicated to achieving compliance with the Federal Information Security Management Act (FISMA) and for this we have identified three key requirements, namely CIS, FIPs, and Auditing. We now have 92% compliance on FISMA tables takes for base AI services. Refer to the documentation for more information regarding which services support FISMA requirements. 

STIG: To comply with STIG Requirements and improve our enterprise readiness, we have introduced the capability to limit the number of login sessions per user. Admin user sessions are also logged off after a 10-minute idle time period is exceeded. 

Improved Accessibility: At IBM, we prioritize accessibility for all users. Our commitment to compliance with Section 508 of the Rehabilitation Act of 1973 ensures equal access to information and technology. We now have 8 services reporting full accessibility compliance with and partially compliant services taking strides to reach full compliance.  

Elasticity & Scalability 

With Cloud Pak for Data v4.7, experience seamless scalability and cost optimization with automatic scaling and flexible service control. 22 services now support automatic scaling using the standard HPA engine and 36 services support service restart through SSR, where the user can choose to shut down services when they are not needed and restart on demand. Refer to the HPA & SSR Documentation for the full list of supported services. 



Cloud Pak for Data v4.7 now supports AWS FSx for NetApp - a fully managed service that provides a high-performance file storage solution that is optimized for use with workloads that require shared access to data. AWS FSx allows for three times more throughput than EFS storage, and saves costs by 46%.  


Common Connectivity 

Accessing data sources is an important component of Cloud Pak for Data, which requires providing connectors and enhancements throughout the delivery cycle. Cloud Pak for Data supports over 80 connectors and various formats, with the addition of using Generic JDBC. With this vast array of supported connectors, the focus has been providing enhancements for existing connectors, such as additional support for authentication. However, two new connectors where introduced: Presto and IBM Product Master. Presto is a fast and reliable SQL engine for Data Analytics and the Open Lakehouse. Product Master is a trusted product information management system (PIM) with collaborative master data management (MDM) capabilities. 


Multicloud Updates 

As always Cloud Pak for Data can be deployed anywhere on a Red Hat OpenShift Container Platform cluster. The cluster can be on any cloud or behind a firewall for on-prem deployments, just as long as the cluster meets the prerequisites defined in the documentation 

For managed hyperscaler OpenShift clusters, IBM continues to refresh the Marketplace and Catalog Services offerings on IBM Cloud, AWS, and Azure. Currently Cloud Pak for Data 4.6.x is available, including Cloud Pak for Data Express Parts and Data Fabric Offerings. These will all be refreshed after Cloud Pak for Data v4.7 is released in early 3Q23 and will also include a revamp of the Marketplace and Catalog Services with consolidated tiles and stream-lined deployments for easier navigation and deployments.  


What's new from Services 

Cloud Pak for Data encompasses the capabilities of 40+ IBM and partner services that has continually expanded. Our services have delivered a range of new features that are available in Cloud Pak for Data v4.7. For a deeper dive check out What’s New in the Cloud Pak for Data documentation, or check out these blogs: