Cloud Pak for Data

Come for answers. Stay for best practices. All we’re missing is you.

View Only

Back to Blog List

Tips for implementing your Cloud Pak for Data upgrade

By Hong Wei Jia posted Thu March 31, 2022 06:25 AM

Tips for implementing your Cloud Pak for Data upgrade
After Pre-upgrade tasks for CPD upgrade done, then the next step is to implement your Cloud Pak for Data upgrade. In this article, I'd like to share some tips about the upgrade implementation.

1.Pre-check before the upgrade
Capture the cluster state and make sure the cluster is in healthy status before the upgrade.
This step is critical to the success. We have to make sure the following conditions met before the upgrade.

1) Check the OpenShift cluster status
a)Make sure the cluster operator are in healthy status
Run the following command.
oc get co

All the cluster operators should be in AVAILABLE status. And not in PROGRESSING or DEGRADED status.
Example:

NAME	VERSION	AVAILABLE	PROGRESSING	DEGRADED	SINCE
authentication	4.6.52	TRUE	FALSE	FALSE	27m
cloud-credential	4.6.52	TRUE	FALSE	FALSE	2d2h
cluster-autoscaler	4.6.52	TRUE	FALSE	FALSE	2d2h
config-operator	4.6.52	TRUE	FALSE	FALSE	2d2h
console	4.6.52	TRUE	FALSE	FALSE	12h
csi-snapshot-controller	4.6.52	TRUE	FALSE	FALSE	57m
dns	4.6.52	TRUE	FALSE	FALSE	14h
etcd	4.6.52	TRUE	FALSE	FALSE	2d2h
image-registry	4.6.52	TRUE	FALSE	FALSE	20h
ingress	4.6.52	TRUE	FALSE	FALSE	2d2h
insights	4.6.52	TRUE	FALSE	FALSE	2d2h
kube-apiserver	4.6.52	TRUE	FALSE	FALSE	2d2h
kube-controller-manager	4.6.52	TRUE	FALSE	FALSE	2d2h
kube-scheduler	4.6.52	TRUE	FALSE	FALSE	2d2h
kube-storage-version-migrator	4.6.52	TRUE	FALSE	FALSE	172m
machine-api	4.6.52	TRUE	FALSE	FALSE	2d2h
machine-approver	4.6.52	TRUE	FALSE	FALSE	2d2h
machine-config	4.6.52	TRUE	FALSE	FALSE	148m
marketplace	4.6.52	TRUE	FALSE	FALSE	159m
monitoring	4.6.52	TRUE	FALSE	FALSE	154m
network	4.6.52	TRUE	FALSE	FALSE	2d2h
node-tuning	4.6.52	TRUE	FALSE	FALSE	2d2h
openshift-apiserver	4.6.52	TRUE	FALSE	FALSE	27m
openshift-controller-manager	4.6.52	TRUE	FALSE	FALSE	2d2h
openshift-samples	4.6.52	TRUE	FALSE	FALSE	2d2h
operator-lifecycle-manager	4.6.52	TRUE	FALSE	FALSE	2d2h
operator-lifecycle-manager-catalog	4.6.52	TRUE	FALSE	FALSE	2d2h
operator-lifecycle-manager-packageserver	4.6.52	TRUE	FALSE	FALSE	158m
service-ca	4.6.52	TRUE	FALSE	FALSE	2d2h
storage	4.6.52	TRUE	FALSE	FALSE	2d2h

b) Make sure all the node are in Ready status
Run the following command.
oc get nodes
All the nodes should be in Ready status.
Example:

NAME	STATUS	ROLES	AGE	VERSION
master0.jhwocp4652.cp.xxx.com	Ready	master	2d2h	v1.19.16+3d19195
master1.jhwocp4652.cp.xxx.com	Ready	master	2d2h	v1.19.16+3d19195
master2.jhwocp4652.cp.xxx.com	Ready	master	2d2h	v1.19.16+3d19195
worker0.jhwocp4652.cp.xxx.com	Ready	worker	2d2h	v1.19.16+3d19195
worker1.jhwocp4652.cp.xxx.com	Ready	worker	2d2h	v1.19.16+3d19195
worker2.jhwocp4652.cp.xxx.com	Ready	worker	2d2h	v1.19.16+3d19195

c) Make sure all the machine configure pool are in healthy status.
Run the following command.
oc get mcp

Example:

NAME	CONFIG	UPDATED	UPDATING	DEGRADED	MACHINECOUNT	READYMACHINECOUNT	UPDATEDMACHINECOUNT	DEGRADEDMACHINECOUNT	AGE
master	rendered-master-dd5251204366d7a3c25261ce8bc5c9fb	True	FALSE	False	3	3	3	0	2d3h
worker	rendered-worker-b2b008ac8cae4a98839cbfe309007fea	True	FALSE	False	3	3	3	0	2d3h

2) Check the Cloud Pak for Data status
If you are to upgrade from Cloud Pak for Data 3.5 to 4.0, you can run the following command.
cpd-cli status -n your-cpd-project
Make sure the Lite and all the services' status are in Ready status.

In addition, you can run the following command for checking whether all the pods are in healthy status.

oc get po --no-headers --all-namespaces -o wide| grep -Ev '([[:digit:]])/\1.*R' | grep -v 'Completed'

Sometime, some of the pods in unhealthy status doesn't mean the cluster in unhealthy status, such as some failed jobs. But we should identify why these pods are in unhealthy status. Once we confirmed that there's no impact to the upgrade, then you can ignore it.

3) Check Image registry
Since Cloud Pak for Data 4.0, a private image registry is recommended or required (air-gapped environment). And the private image registry is important as it hosts all the images that your Cloud Pak for Data 4.X services needed to be up and running. So I strongly recommend you check the image registry status and also have an overview of the images in it.

Run the following command for logging into your private image registry server.
podman login --username $PRIVATE_REGISTRY_USER --password $PRIVATE_REGISTRY_PASSWORD $PRIVATE_REGISTRY --tls-verify=false

If it could be logged in successfully, it means your private image registry is up and running.

Run the following command to have an overview of the images in it.
curl -k -u ${PRIVATE_REGISTRY_USER}:${PRIVATE_REGISTRY_PASSWORD} https://${PRIVATE_REGISTRY}/v2/_catalog?n=6000 | jq .

2. Implement the upgrade following the runbook prepared in pre-upgrade phrase
In the section 8) of this article Pre-upgrade tasks for CPD upgrade , a validated and well prepared upgrade runbook is recommended as one of the pre-upgrade task. Following this runbook, you can implement the upgrade with less risk and efforts. But some tips here maybe helpful.
1)Temporarily disabling the route to the Cloud Pak for Data cluster
This can help to prevent the end-users from using this cluster during the upgrade.
Back up your Cloud Pak for Data route firstly with the following command. Note: change the your-cpd-route and your-cpd-project accordingly.
oc get route your-cpd-route -o yaml -n your-cpd-project > your-cpd-route-bakup.yaml
Then you can delete it.

2)For upgrade from 3.0.1 to 3.5, apply the latest patch for lite and upgrade your SPSS (if installed) to 3.0.2 are required.

3)Uninstalled the services deprecated in the target upgrade version
Pleas refer to the section 3) of the article Pre-upgrade tasks for CPD upgrade
4)Stop the environment runtimes and cron-jobs are recommended.

Actually, as mentioned in the section 5)Evaluate and decide the time window of the article Pre-upgrade tasks for CPD upgrade, the end-users are recommended to stop their own environment runtimes and scheduled jobs before the upgrade. But we'd better take following actions to make sure that environment runtimes and scheduled jobs are stopped or suspended.

List the active environment runtimes firstly

for mydeploy in $(oc get deploy -l created-by=spawner --no-headers| awk '{print $1}') ; do echo $mydeploy; done

Stop the active environment runtimes

for mydeploy in $(oc get deploy -l created-by=spawner --no-headers| awk '{print $1}') ; do oc delete deployment $mydeploy; done

Suspend cron jobs before the upgrade.
oc get cronjobs -n your-cpd-project | grep False| grep -v spark | cut -d' ' -f 1 | xargs oc patch cronjobs -p '{"spec" : {"suspend" : true }}'

But note that re-enable the cron jobs after the upgrade done.

5) OpenShift upgrade
For the OpenShift upgrade, I assume your OpenShift cluster version is OCP 4.X. If your OpenShift cluster version is still 3.11, migration rather than upgrade would be required as OpenShift doesn't support the in-place upgrade from 3.11 to 4.X. Regarding the migration, I'll introduce it in details in a separate article later.
a) OpenShift doesn't support hops upgrade for the major releases and sequential upgrade is required

For example, your current OpenShift version is 4.5.X and you plan to upgrade it to 4.8. Your upgrade path would be 4.5.X -> 4.6.Y -> 4.7.Z -> 4.8.N.

b) When upgrading OpenShift, subscribe to the EUS channel for getting the EUS support if it's available.
For example, the OCP 4.6 support has been End of Support. But the Extended Update Support of OCP 4.6 (4.6-EUS) support is still available. If you still want to stay on OCP 4.6 with support, then you'll have to update the your OpenShift cluster's OCP channel to EUS-4.6.
For more information, please refer to the OpenShift's lifecycle policy.
c)Validate the OpenShift cluster status
Make sure the cluster operator are in healthy status
oc get co
Make sure all the nodes are in Ready status
oc get nodes
Make sure all the machine configure pool are in healthy status.
oc get mcp

6) Storage upgrade
When upgrading from Cloud Pak for Data 3.5 to 4.0, if you are running an unsupported version of Red Hat® OpenShift® Container Storage or Portworx, you must upgrade your storage before you upgrade to IBM® Cloud Pak for Data Version 4.0. For information about supported versions of shared persistent storage, see Storage requirements.

7)Cloud Pak for Data platform and services upgrade
Different services may have different prerequisites and procedures for the upgrade from 3.5 to 4.0. For example, some services (e.g. Data virtualization) require that you create the Db2U operator subscription manually while some others not. And for some Watson services (e.g. Watson Discovery, Watson Assistant), you may have to enable the License Service of the IBM Cloud Pak Foundational service.

There maybe service instances provisioned for some particular services, such as Spark instance, Data Virtualization instance, Db2WH instance. For these kind of services, apart from upgrade the services themselves, you also need to upgrade the service instances accordingly.

Record the commands and the corresponding results during the upgrade. After each service upgrade done, make sure it is in healthy status before you proceeding to next upgrade of another service.

8)Troubleshooting
When your Cloud Pak for Data services' upgrade failed, be cautious about the rollback during the troubleshooting. The rollback during the upgrade from 3.5 to 4.0.X is not supported. Even for the upgrade from 3.0.1 to 3.5, the rollback for some services are not supported, e.g. WML.

Reach out to IBM Support with a support ticket for the help and assistance if needed.

Restore should be the last resort.

Summary
In this article, I introduced some tips about the upgrade implementation. Most of them are from the lesson learnt and experience we accumulated. Hope it's helpful! And in my next article, I'll introduce some tips about the post upgrade.

#CloudPakforDataGroup

0 comments

18 views

Permalink

https://community.ibm.com/community/user/blogs/hong-wei-jia1/2022/03/31/tips-for-implementing-your-cloud-pak-for-data-upgr

Cloud Pak for Data

Cloud Pak for Data

Tips for implementing your Cloud Pak for Data upgrade

By Hong Wei Jia posted Thu March 31, 2022 06:25 AM

Permalink

Additional
Resources

Office

Quick Links

Cloud Pak for Data

Cloud Pak for Data

Tips for implementing your Cloud Pak for Data upgrade

By Hong Wei Jia posted Thu March 31, 2022 06:25 AM

Permalink

Additional Resources

Office

Quick Links

Additional
Resources