Cloud Pak for Data Group

 View Only

What's new in Cloud Pak for Data 4.0

By Polya Markova posted Mon July 19, 2021 04:17 PM



IBM Cloud Pak for Data 4.0.0 is available now, and it opens a new chapter for intelligent automation of diverse data landscapes. A single distributed query across your disparate data helps you run the distributed and virtualized queries 53% [1] faster than the industry standard. Furthermore, the AutoAI Automatic code generation increases trust transparency and ensures zero vendor lock in. 


Simpler installation  

In 4.0, the Cloud Pak for Data control plane and services are installed using operators, which simplify the process of upgrading, scaling, and rolling back software on Red Hat OpenShift Container Platform. 


Platform enhancements 

  • Adoption of the IBM Cloud Pak® foundational services.  
  • Monitoring of event and alert information. 
  • Monitoring of resource use by service, service instances, environments, and pods. 
  • Permissions enhancements: 
  • New permissions for creating and managing projects and deployment spaces. 
  • Updated catalog permissions to separate the creation of catalogs from the management of catalogs. 
  • Updated administration permissions to provide more granular control. 
  • Enhanced connections interface. 
  • Shared credentials available for platform connections. 
  • Support for Windows Samba Shares for Volume Management. 
  • CLI commands to gather diagnostic information and to import and manage Cloud Pak for Data users. 


Security enhancements 

  • Fewer custom security context constraints - many services can use the default restricted security context constraint (SCC). 
  • Integration with Identity and Access Management Service (IAM Service) - use multiple identity providers for authentication and a single sign-on across multiple IBM Cloud Pak installations. 
  • Namespace Scope Operator - use in tandem with the ownNamespace operator group to improve security in shared clusters.  


New services are now contributing to the AI and data governance capabilities 

  • With IBM Match 360 with Watson, data engineers can generate a customizable data model when they add a new data source to IBM Match 360. And business users can access IBM Match 360 to search, explore, and analyze master data entities. 
  • IBM Product Master helps your business to automate the ingestion and governance of product information. It provides trusted product management information and collaborative master data management capabilities. 


Some highlights for the services 

  • Analytics Engine Powered by Apache Spark offers a new version of the Spark jobs REST API, which supports additional spark-submit options. 
  • Data Refinery has a new Spark environment for running Data Refinery flow jobs. 
  • Data Virtualization now supports additional data sources such as Denodo, Snowflake, and SAP HANA. It also features lots of improvements. 
  • Db2 now adds support for multiple HADR standbys, support for tethered projects, and a new backup and restore method. 
  • Db2 Big SQL now connects to Hadoop clusters on Cloudera Data Platform (CDP) Private Cloud Base 7.1.6. Furthermore, it adds support for several features in the CREATE TABLE and ALTER TABLE statements. You can also bypass temporary directories when you insert data into tables in object stores. 
  • Db2 Data Gate can update the credentials for the Db2 for z/OS® source database seamlessly. It also adds query routing support (beta). 
  • Db2 Data Management Console has improved alerts and notifications. It now supports job management and scheduling plus additional KPIs for reports. 
  • Db2 Warehouse now supports multiple standby databases for the High Availability Disaster Recovery (HADR) feature. Additionally, you can provision Db2 Warehouse instances into a tethered project. A snapshot backup and restore is available for both Kubernetes resources and storage data. 
  • Decision Optimization comes with a new Decision Optimization runtime. The improvements also include support for C# models, new features in the Modeling Assistant, support for audit logging, and CPLEX V.20.1. 
  • You can now provision the MongoDB Ops Manager separately from individual MongoDB databases. 
  • OpenPages now supports external databases and tethered projects. OpenPages now integrates with the user authentication and management features in Cloud Pak for Data. 
  • SPSS® Modeler comes with a changed environment size, interactive tree builder, and added nodes for Sim Eval and Streaming TCM. You can now load R and Python libraries to use with the extension nodes.  
  • Watson Knowledge Catalog extends its commitment to secure and transparent governance by integrating with platform auditing. Platform user groups can now be used as collaborators in categories and catalogs, as well as in data protection rules. It adds support for new connection types. It provides an improved search across the platform and several enhancements in data discovery and data quality.  
  • Watson Machine Learning includes support for an expanded set of popular frameworks and software specifications for building and deploying machine learning models. It features new AutoAI training, a new AutoAI time series, and new ways for you to tune your experiments. 
  • Watson OpenScale features support for Db2 as a data source in batch environments and additional capabilities in batch environments. You can now use a remote instance of Watson Machine Learning. 
  • Watson Studio includes a new version of JupyterLab and support for new connection types. You can now access assets with the ibm-watson-studio-lib library. 


You can learn more in the What's new topic in the Cloud Pak for Data documentation. 



[1] Based on internal testing