Cloud Pak for Data Upgrade Best Practice
Cloud Pak for Data is evolving fast. New features, fix packs are delivered with new versions. And there's also a lifecycle for Cloud Pak for Data. So customer would like to upgrade for getting the new features, enhancements, fix packs or IBM support. Considering the importance of upgrade, we'd like to share the upgrade best practice by summarized the knowledge, experience and lesson learnt from IBM support team.
Must-know about CPD upgrade
As CPD upgrade impacts the production use, it should be placed as critical importance. Cloud Pak for Data upgrade is not only about technical implementation. It's recommended take Cloud Pak for Data upgrade as a project. A project plan should be ready before the upgrade.There are 3 key phrases for the Cloud Pak for Data upgrade including pre-upgrade, upgrade implementation and post-upgrade. And I'll walk through these 3 parts individually in details later. In this article, let's start with the upgrade plan which covers the basics about the 3 key phrases mentioned above.
An upgrade plan matters
The upgrade plan could be divided into 3 parts: pre-upgrade, upgrade implementation and post-upgrade.1. Pre-upgrade
1) Decide the target Cloud Pak for Data and OpenShift versions
Different customers may have different choices or preferences. Some customers always want the latest Cloud Pak for Data version. While others maybe more conservative. Compared to the latest major release, they would like to just upgrade their product to a prior major one which they deem as more stable.
Once the target Cloud Pak for Data version decided, the OpenShift version and the storage can also be decided accordingly.2) Take backup as part of the upgrade
The purpose of the backup is to ensure the production cluster could be restored in case there's critical upgrade failure. The backup of the cluster is comprised of the Openshift part and the Cloud Pak for Data part.
There are some technical differences between the backup and restore service of CPD 3.5 and that of CPD 4.0. Depending on your Cloud Pak for Data version, you can implement the backup with the backup and restore service accordingly.Note:
The backup supported by CPD 3.5 and 4.0 is still offline backup.3) Evaluate and decide the time windowThere will be service outrage during Cloud Pak for Data upgrade and a time window is required.
The time window mostly depends on the services installed on your Cloud Pak for Data cluster. It's recommended discussing with the end-users about the time window and timing about the upgrade as the production use would be impacted. And the end-users should be informed and get prepared for it. And they are supposed to stop the environment runtimes (Jupyter Notebooks, JupyterLab, RStudio, etc) and scheduled jobs before the upgrade. This can help to lower the risk of upgrade.4) Get the required images and libraries ready
It's not only about download but also about the validation of the the images and libraries for the upgrade.
5) Work out the runbook for the upgrade
The runbook may include the upgrade of OpenShift, Storage (e.g. Portworx, ODF) and Cloud Pak for Data. You can work out the runbook following IBM documentation. And it's recommended you validate it in a test cluster.
6) Get the resource and necessary support from customer's IT team
IT team's support should be ready for resolving any infrastructure related issues incurred during the upgrade including Network, Hardware, etc. An urgent contact list is recommended. 7)Sort out the list of technical limitation or know issues for the upgrade
This could be done by checking the IBM documentation about each service to be upgraded. 2. Upgrade implementation
1)Pre-check before the upgrade
Capture the cluster state and make sure the cluster is in healthy status before the upgrade.
2)Implement the upgrade following the runbook prepared in pre-upgrade phrase
Record the commands and the corresponding results during the upgrade. For each service upgrade, make sure it is in healthy status after the upgrade before you proceeding with the upgrade of another service.
Be cautious about the rollback. The rollback during the upgrade from 3.5 to 4.0.X is not supported. Even for the upgrade from 3.0.1 to 3.5, the rollback for some services are not supported, e.g. WML.
Reach out to IBM Support for the help and assistance if needed.
Restore should be the last resort.3. Post-upgrade1)Validation
Capture the cluster state and make sure the cluster is in healthy status. Enable the route and Cronjob suspended in the Pre-upgrade phrase. Some sanity tests from the end-users is also recommended.
For the upgrade to another major version, some migration efforts maybe required. For example, there are version changes of environment runtimes and machine learning framework supported by Cloud Pak for Data for upgrade from CPD 3.5 to 4.0. Because of this, there will be incompatibility issues and the data scientists may have to make some code changes to their notebooks.3)An enablement workshop about the new version for the end-users
There are platform architecture and design changes when upgrading to a new major version. An enablement workshop can get the end-users used to the new version in a shorter time.
In this article, we introduced the overview of Cloud Pak for Data upgrade and the upgrade plan which covers the basics about the phrases pre-upgrade, upgrade implementation and post-upgrade. I'll walk through the tasks involved in these 3 phrases individually in details later in another articles.