Cloud Pak for Data

 View Only

Must-gather for Cloud Pak for Data upgrade

By Hong Wei Jia posted Tue March 29, 2022 05:59 AM

  
Must-gather for Cloud Pak for Data upgrade

The purpose of this questionnaire is to help get an comprehensive overview of the Cloud Pak for Data environment before the upgrade. It is comprised of Cloud Pak for Data services, OpenShift cluster, storage and infrastructure related information. This information is critical for making a comprehensive upgrade plan to secure a successful outcome.

Note
Some commands are provided below to help you collect the information. To run these commands, log in the OpenShift cluster as the cluster administrator role is required.

1. Current CPD version

2. Target CPD version

3. Current OCP version

Check the OpenShift cluster version with oc client  
#oc get clusterversion

4.
Target OCP version

5. Current storage type and version (ODF/OCS, Spectrum Scale, Portworx, NFS, etc)

1)If the storage is ODF (renamed from OCS), get more details of the ODF storage with the command:

#oc describe cephcluster ocs-storagecluster-cephcluster -n openshift-storage

2) If the storage is Spectrum Scale, is the Spectrum Scale Container Native (with CSI driver) used? If yes, what's the version of the Spectrum Scale storage cluster (remote cluster)?

3)If the storage is Portwox, get more details of the Portworx storage with the command

#PX_POD=$(kubectl get pods -l name=portworx -n kube-system -o jsonpath='{.items[0].metadata.name}')

#kubectl exec $PX_POD -n kube-system -- /opt/pwx/bin/pxctl status

Note:
Separate license (Portworx Enterprise) is required starting at 4.5.

4)If the storage is NFS, what's the version of the NFS protocol in use ? 

5. Target storage type and version

6.The installation method of the existing CPD cluster (applicable only to CPD 4.X)
What's the installation method of the existing CPD cluster? Is it Express installation or Specialized installation?

7.Number of CPD instances
How many CPD instances are sharing the same OCP cluster?

8.List of CPD services (and version of each) installed in each CPD instance’s namespace

1)If the current CPD version is 3.5.X, get a list of all current CPD services and their versions with below command.

#./cpd-cli status -n <your-cpd-namespace>
Note: you need to replace the placeholder accordingly when running this command.

2)If the current CPD version is 4.0.X/4.5.X/4.6.X, get a list of all current CPD services and their versions with below command.

#for i in $(oc get crd | grep cpd.ibm.com | awk '{ print $1 }'); do oc get $i | grep -v "NAME" ; oc get $i $(oc get $i | grep -v "NAME" | awk '{ print $1 }') -o jsonpath="{.spec.scaleConfig}{'\n'}" ; oc get $i $(oc get $i | grep -v "NAME" | awk '{ print $1 }') -o jsonpath="{.status}{'\n'}";echo "---------$i------------" ;done
 

9. Is CPD System installed? If yes, please run "ap version" and share the output. CPDS may need to be upgraded first if it is running at an incompatible version with the target CPD version.

10. Is it in-place upgrade or the upgrade on 2 separate clusters with one acting as staging environment? 

11. Collect Cloud Pak for Data usage information

For production upgrade, there's high chance that the upgrade has to be completed within a time window. The prioritization of the services to be upgraded can help to reduce the risk of upgrade failure. The following table as an example can help to have an overview of the Cloud Pak for Data usage information. Based on this table, you can upgrade the services of high importance firstly and then proceed with the upgrade for other services.

Service name

Used (Y/N)

Size (Default/Small/Medium/Large)

 Level of importance High/Medium/Low

WS

Y

Default

High

WML

Y

Default

High

WKC

Y

Large

High

Jupyter Notebooks for GPU

N

N/A

N/A

R Studio

Y

Default

High

Spark

N

N/A

N/A

SPSS Modeler

Y

Default

Medium

Data Virtualization

N

N/A

N/A

Db2Warehouse

N

N/A

N/A

Data Management Console

N

N/A

N/A

Cognos Dashboard

N

N/A

N/A

Watson Discovery

N

N/A

N/A

Watson Assistant

N

N/A

N/A

……

    

 

 


12.Number of OpenShift nodes 

Collect information about the number of master nodes and worker nodes in your OpenShift cluster.

#oc get nodes --show-labels 

13.Node configuration (CPU, MEM, etc )

Collect information about the hardware configuration for each worker node (the number of CPUs and MEM capacity) in your current OpenShift cluster.

#oc describe nodes

Collect the disk space information of the worker node (assume all the worker nodes are with the same configuration)
#export workernode=<the name of the worker node 1>

#oc debug node/${workernode} -- lsblk

14. The location of your Cloud Pak for Data installation (On-premise, which cloud, etc)

15.Operating System of the Bastion node 

Bastion node is also known as the client workstation on which you can run the oc client to operate the OpenShift cluster.
RHEL V8 or equivalent is recommended.


16.Private container registry
1)Is there an private container registry available to be used by the Cloud Pak for Data 4.X?
Strongly recommend the use of private container registry. If you already have one, please indicate amount of available free space, 500GB is recommended.
  
Note: You cannot use the integrated OpenShift Container Platform registry. It does not support multi-architecture images and is not compliant with the Docker Image Manifest Version 2, Schema 2.

17.Internet access
Does the OpenShift cluster have internet access?  If not, is there a proxy for accessing white-listed sites?

18. Type of environment(s) (Dev/Test, Staging or Production)

19. Any application that uses CPD services?

For any questions about the above must gather information, you can engage IBM Support by opening a CSP (Salesforce) case either using the CSP Console or IBM Support. 

20. Target starting date of upgrade?

Note:

Please share relevant information if applicable.

  • Any RSI patches or Hotfixes applied in the cluster? If yes, please share the detailed information accordingly.
  • If Db2 Warehouse or Db2 OLTP installed, any dedicated worker nodes used?
  • If Data Virtualization (renamed as Watson Query since CPD 4.6) service is deployed, is it using external JDBC driver?
  • Any custom images used? If yes, please share the detailed information accordingly.


#CloudPakforDataGroup

2 comments
57 views

Permalink

Comments

Mon August 28, 2023 09:39 AM

@VLADIMIR KIM

Thanks for the suggestion! Updated accordingly.

Fri July 28, 2023 10:29 PM

It would be good to include this question as well:

Are there any RSI patches or Hotfixes applied for any of the CP4D services (they must be removed prior the upgrade)