Authors: Sachin Prasad and Malcolm Singh
Introducing Cloud Pak for Data version 4.8
The IBM Cloud Pak for Data team is excited to announce the general availability of Cloud Pak for Data (CPD) version 4.8. Since the general availability of version 4.0 in June 2021, this marks the fourth feature release in the series. This release includes an array of features tied together with an emphasis around serviceability - enhancing the management and maintenance of CPD Platform and services. These serviceability improvements provide more information about the platform to proactively ensure a stable environment for CPD production workloads.
Beyond the core focus on serviceability, CPD v4.8 is also geared to bolster your data fabric journey. The latest deployment features are refined for efficiency, and with additional storage options you ensure your data fabric can adapt with organizational needs. In additional to these features, CPD continues to expand enterprise readiness capabilities with disaster recovery options and extending security solutions.
Cloud Pak for Data Platform Release Notes
IBM Cloud Pak for Data already provides a robust plug-and-play extensible monitoring framework. This extensible framework allows customers & lab services to build their own custom monitors per requirements. IBM Cloud Pak for Data v4.8 latches on to this framework and now provides ready to use out-of-the-box monitors for the multiple services. These monitors:
- are designed to provide immediate, actionable insights with no configuration required, significantly reducing the administrative burden and accelerating the time to value
- will assess service health and identify any potential problems by generating informative alerts
v4.8 also lays down the foundation & framework to enable cluster level monitoring (in addition to namespace level monitoring) if right set of permissions are provided. One usage of this framework is that now, admins get notifications if cluster is about to run out of storage so they can take corrective action. The enhanced diagnostic log collection further contributes to this by providing more essential platform and service information for deeper troubleshooting.
User Activity Monitoring:
User Activity Monitoring provides administrators with tangible insights about user actions on the platform. User Activity Monitoring is integrated effortlessly with the existing framework and infrastructure with little resource usage, due to it’s lightweight nature. It is an invaluable asset for customers aiming to satisfy compliance and regulatory demands, offering comprehensive logs that detail every action within the platform. The availability of historical logs also equips system administrators to assess the impact of specific user actions, and provide traceability. The insights gained from User Activity Monitoring not only aid in streamlining compliance requirements but also provide a clear trace of user interactions. Learn more about Monitoring User Activity.
Enhanced Diagnostic Log Collection:
The Diagnostic Log collection process is improved to initially collect more essential information that helps accelerate troubleshooting and reduce the time to resolution. By integrating a comprehensive logging mechanism, it enables administrators to collect detailed diagnostic information quickly and efficiently. By expanding the data collection to include a broader and more critical set of information, the cpd-cli diag command continues to make this process swift and straightforward, allowing for the ease of log retrieval.
Kubernetes Resource Description:
Customers can now know more about the underlying micro-services that power Cloud Pak for Data with additional information that comes along with each kubernetes pod. This inclusion offers detailed metadata for the kubernetes resources, enhancing one’s understanding of the pods and so basic troubleshooting during failures can be carried out without need for support. Accessible via the cpd-cli get-k8s-details command, the enhanced pod descriptions provide in-depth insights into service relationships and dependencies between pods. With the implementation of label standardization, customers benefit from consistent monitoring and reporting across various kubernetes objects, which is essential for operational efficiency.
Install and Upgrade Health Checks:
Install and Upgrade Health Checks act as a health advisor for your data platform. It provides a straightforward way to inspect the platform’s health, and to check the platforms stability and proactively identify potential issues. These checks are particularly useful when undergoing system upgrades, acting as preventative measure as well as post-upgrade assurance. Customers can have the peace of mind, via the cpd-cli health command for quick, on demand health checks, providing confidence in the Cloud Pak for Data cluster’s stability. In addition, frequently scheduled health checks can also help to ensure that CPD continues to perform optimally by detecting and addressing any performance bottlenecks.
FISMA Tablestakes and Federal Mandates:
Over fast two years, Cloud Pak for Data team has made tremendous improvements around security primarily around FISMA table-stakes which included a major requirement around FIPS encryption & CIS compliance. With v4.8 the work effort is now complete. The next leg of work would include AWS Gov Cloud certification for some services and IPV6 certifications which is going to become a hard requirement for federal deployments.
Least Privilege RBAC for Install and Upgrades:
Least privilege Role Based Access Control (RBAC) for install and upgrade simplifies the permissions needed for deployments within large enterprises between the cluster and Cloud Pak for Data administrators. It does this by clearly defining who can do what, ensuring that CPD administrators have just enough access to perform their tasks without excess permissions that could pose security risks. This introduction provides teams with essential information on the minimum required access for setting up the platform, which not only enhances security but also help the logistics process when preparing for deployment. By offering this information before the actual installation, it prevents potential security risks and aligns with enterprise security measures. Customers can benefit from a more efficient and compliant operation, safeguarding their data and aiding in timely project executions. Learn more about about cpd-cli show-minium-rbac.
Attribute-based Access Control (ABAC) Enhancements:
Attribute-Based Access Control (ABAC) allows for flexible and dynamic access decisions based on a user's attributes, which can change from session to session—like their location or the device they are using. This means that access rights can be automatically adjusted, for example, to comply with regional data regulations that may restrict access to certain information when an employee is traveling. The platform now supports the use of any attribute from an Identity Provider in ABAC rules and even allows for custom attributes to be part of the access control logic. This flexibility in defining access controls translates into a more secure, compliant, and adaptable environment, providing businesses with the means to safeguard sensitive data while being versatile given the nature of modern work scenarios.
Read more about this feature in this blog.
Pluggable Vault Integration & SDK:
The new Vault Bridge SDK facilitates building access for Cloud Pak for Data to additional external vaults. This integration allows users to access credentials effortlessly, streamlining the process of securing data source connections using the SDK. Examples using the SDK include external vaults such as IBM Cloud Secrets Manager, AWS Secrets Manager, and Azure Key Vault. For users, this means enhanced security protocols with simplified credential management, which is crucial for protecting sensitive information. The flexibility to integrate with multiple vaults also offers versatility in managing access to data across different cloud environments, adding a valuable layer of data governance. Delve deeper on how to use the Vault Bridge SDK.
Custom Authentication Service - Cloud Pak for Data Platform can be configured to authenticate users against a custom authentication service. This provides an alternative method for user authentication, complementing the existing system and giving users the ability to authenticate via the generated token from the custom authentication service. The main benefit is the customization to fit diverse enterprise needs, offering a more tailored and secure authentication process. Explore more on Custom Authentication Service.
API Authorization Keys - the ZenAPIKey streamlines how developers access services and APIs. Now with all of the services supporting this key, developers can use a single token for authentication across all services, eliminating the complexity of managing multiple keys and tokens. This simplification means developers can focus more on building and less on navigating authentication protocols, enhancing productivity. Learn how to create API Authorization Keys.
Parallel install allows to run installation of multiple services in parallel to provide a faster install and upgrade posture. The underlying mechanism ensures that the dependecies of a service are installed first before updating/installing the dependent service. Internal tests shows 20% -50% faster installs with the approach.
Cloud Pak for Data Scheduler Enhancements:
Scheduler enhancements with IBM Cloud Pak for Data 4.8 provide insights in to job queue positions, allow for workload specific timeout settings, and advanced job priority management. These features will help customers efficiently manage their resources and ensure timely execution of critical tasks. As a result, businesses can also optimize operations and better align with performance objectives.
Minimizing resource usage is one of the main request from our Cloud Pak for Data customers to help reduce their overall costs especially on hyperscaler environments. The IBM team has explored and delivered in different areas to help reduce the footprint of Cloud Pak for Data. Which included the introduction of a new t-shirt size initially available for Cloud Pak for Data Express Parts. The t-shirt size: small_mincpureq, reduces the overall reserved CPU resources and inturn increases the availability of CPU resources for active services. Now this t-shirt size is available for Cloud Pak for Data base services and cartridges. The use of the t-shirt size will help reduce the overall usage from cloud infrastructure and cluster topology for entry level configurations.
AI on Power is accelerated with IBM Cloud Pak for Data 4.8. IBM Power is engineered for enterprise AI with leading reliability, security, sustainability and performance optimization. Cloud Pak for Data 4.8 now comes with expanded support on IBM Power with below services -
- Data management Service - DMC, DB2, DB2WH (This is an existing support)
- Watson Studio with Data Refinery services
- Decision Optimization
- Auto AI for model creation/management
- SPARK for analytics
IBM Cloud Pak for Data 4.8 now comes with a bonus for users: IBM Storage Fusion Essentials is bundled at no extra cost. Once the CPD v4.8 license is obtained, access is provided to a world class file system that can handle up to 12TB of data per cluster. This makes starting with CPD easier since there's no need to buy a separate storage license. If a business grows and needs more than 12TB, or wants extra features like backup and disaster recovery, they can easily upgrade to the Fusion Advanced version. This addition ensures that CPD is offering a strong starting point for any company's data management needs, with room to grow in the future.
Turbonomic for Cloud Pak for Data:
The integration of IBM Turbonomic with Cloud Pak for Data represents a significant leap forward in AI-driven resource optimization. With CPD v4.8, Turbonomic is offering a special evaluation license, providing customers with an enhanced method for infrastructure management. This integration delivers a wide array of benefits, including detailed performance monitoring and analysis, which allows businesses to gain a holistic understanding of their resource utilization and to uncover opportunities for efficiency gains. Customers can expect to receive tailor-made recommendations that align with their unique operational goals, ensuring that their infrastructure is not only compliant with business policies but also optimized for cost-effectiveness and performance. The evaluation edition acts as a strategic decision-support mechanism, offering actionable insights without immediate automation, allowing for a controlled, incremental approach to resource management. This careful calibration of resources helps to minimize waste and improve overall system responsiveness.
For more information and instructions on using the free evaluation edition:
IBM Turbonomics Install instructions, licensing and support
IBM Turbonomic Community
IBM Turbonomics Online documentation
Introducing Turbonomic for Cloud Pak for Data v4.8
Using IBM Turbonomic for Monitoring Cloud Pak for Data — Part 1
Using IBM Turbonomic for Monitoring Cloud Pak for Data — Part 2
As always, Cloud Pak for Data can be deployed anywhere on a Red Hat OpenShift Container Platform cluster. The cluster can be on any cloud or behind a firewall for on-prem deployments, just as long as the cluster meets the prerequisites defined in the documentation.
Cloud Pak for Data can also be deployed using a streamlined process on Managed OpenShift offerings for IBM Cloud, AWS, and Azure. This process includes both base services and cartridges under one tile for easier navigation that leads to smoother 'one-click' deployments. New to version 4.8 is the addition of the following Cloud Pak for Data services that is available on IBM Cloud ROKS, ROSA, and ARO:
Watson Speech Services
These Managed OpenShift offerings for Cloud Pak for Data on IBM Cloud, AWS, and Azure will be refreshed to 4.8, which will include Cloud Pak for Data Express Parts.
Azure NetApp Files
In addition to this refresh, Cloud Pak for Data v4.8 now supports Azure NetApp Files - a fully manage service that provides a high-performance file storage solution. This support provides additional storage options on Azure, which is optimized for use with workloads that require shared access to data. This continues support from FSx for NetApp, with more support for NetApp to come in future releases.
Disaster Recovery Solutions:
Disaster Recovery (DR) is a vital process to establish for enterprises to maintain core functionality in the event of catastrophic failures on-premise or outages in the cloud. A Disaster Recovery solution is composed of IT technologies and integrated processes to minimize business disruptions due to events ranging from data center failures to natural disasters, including cyberattacks and emergencies. Cloud Pak for Data first supported a Disaster Recovery solution using IBM Storage Fusion. This has now expanded to include NetApp and Portworx.
NetApp Trident ONTAP
NetApp Trident ONTAP can now be used in a Disaster Recovery solution for Cloud Pak for Data. This solution is built using asynchronous online backup and restore, where the restore takes place on a separate target cluster usually located in a different data center or cloud location. The process is controlled and maintained using NetApp Astra Control Center (ACC), which offers stateful Kubernetes workloads a rich set of application-aware data management services powered by NetApp’s trusted data protection technology. To learn more about how to use NetApp ACC to build a DR solution for CPD check here.
Portworx can also now be used in a Disaster Recovery solution for Cloud Pak for Data. This is an asynchronous disaster recovery solution built upon the Portworx Disaster Recovery framework using data replication. The data replication consists of scheduled migrations to a separate destination cluster. Failover to the destination cluster can be initiated in the event of a failure at the primary source cluster. For further details on how Portworx can be used in a DR solution for CPD check here.
Accessing data sources is an important component of Cloud Pak for Data, which requires providing connectors and enhancements. CDP supports over 80 connectors and various formats, with the addition of using generic JDBC. With this vast array of supported connectors, the focus has been providing expanded support for additional authentication options. These enhancements followed the introduction of watsonx.data connector, which IBM's open, hybrid and governed data lakehouse. In addition to these enhancements and new connector, a new feature was introduced provide flexibility by defining reusable connectors: Common Connectors.
Customers can now define their own reusable connector to any JDBC data source. Knowledgeable users can define custom connectors and make them available for end users to easily create connections within CPD. This may be to access a data source that CPD does not provide a connector to, or to create a variation of an available CPD connector with unique properties and configurations. With custom connectors, customers will have greater flexibility and control of their data source connectivity.
Cloud Pak for Data v4.8 Services Highlights
IBM Cloud Pak for Data includes numerous services that customers can use to extend the functionality of Cloud Pak for Data. Some services are included in the purchase of Cloud Pak for Data; other services are separately priced.
A few key service updates include:
New dashboard visualizations with Cognos Dashboards.
IBM Knowledge Catalog has enhanced customizations through customer properties for user groups to identify additional stakeholders on assets.
Folder support for DataStage and impact analysis for assets.
To learn more about the individual service feature set, please visit the What’s New section in Cloud Pak for Data Documentation
Try Cloud Pak for Data for free
Introducing Turbonomic for Cloud Pak for Data v4.8
Read additional thoughts from Sachin on Medium
Watch Cloud Pak for Data v4.8 videos