File and Object Storage

How to install the NVIDIA GPU Operator on OpenShift 4.5

By GERO SCHMIDT posted Wed May 12, 2021 06:34 AM

  

IBM Elastic Storage System (ESS) and IBM Spectrum Scale Cloud Native Storage Access (CNSA) offer industry leading enterprise storage solutions for OpenShift and big data analytics workloads.

In order to take full advantage of the scalability and performance of IBM Spectrum Scale as storage provider and GPU-accelerated worker nodes for your AI (Artificial Intelligence) and DL (Deep Learning) workflows you also need to properly enable GPU support in your OpenShift cluster.

In this blog post we will briefly guide the reader through the necessary steps to enable the NVIDIA GPU operator in OpenShift 4.5 using the OpenShift web console and the Red Hat OperatorHub catalog. We will also share some hints for troubleshooting common issues that you may encounter along the way.

The basic steps for installing the NVIDIA GPU operator are outlined in OpenShift on NVIDIA GPU Accelerated Clusters but additional steps might be required - especially if deployed in Proof of Concept (PoC) environments on bare metal OpenShift deployments (i.e. user-provisioned infrastructure) with temporary Red Hat licenses.

Make sure you are logged in to your OpenShift cluster as a cluster-wide admin to perform the next steps.

Step 1: Acquire required Red Hat subscriptions

Before you apply the NVIDIA GPU operator you need to make sure that the appropriate Red Hat subscriptions and entitlements for OpenShift are properly enabled. The UBI-based driver pods of the NVIDIA GPU operator require these Red Hat subscription entitlements so that additional UBI packages can be installed.

Hint #1: An installation of the NVIDIA GPU driver will fail on bare metal OpenShift clusters with CoreOS worker nodes (i.e. on user-provisioned infrastructure) without these entitlements and you will see the following error for the NVIDIA GPU operator driver pods in the default gpu-operator-resources namespace (project) if no such entitlements are enabled:

# oc get pods
NAME                                       READY   STATUS             RESTARTS   AGE
nvidia-container-toolkit-daemonset-866sx   0/1     Init:0/1           0          16h
nvidia-container-toolkit-daemonset-psb29   0/1     Init:0/1           0          16h
nvidia-container-toolkit-daemonset-vcsjg   0/1     Init:0/1           0          16h
nvidia-driver-daemonset-27r2c              0/1     CrashLoopBackOff   180        16h
nvidia-driver-daemonset-6vqqr              0/1     CrashLoopBackOff   181        16h
nvidia-driver-daemonset-969m2              0/1     CrashLoopBackOff   181        16h

# oc logs nvidia-driver-daemonset-27r2c | grep -i error
Error: Unable to find a match: elfutils-libelf-devel.x86_64

For Proof of Concepts (PoCs) you may obtain a 60-day trial license from https://www.redhat.com/en/technologies/cloud-computing/openshift/try-it  and assign it to a system in your Red Hat account (https://access.redhat.com/management/systems) as shown below

Once you have a valid subscription attached to your system on the Red Hat Customer Portal you can download the required certificates with the required pem files using the Download Certificates button. Depending on the subscriptions you will find a directory tree like the one below after unpacking the zip archive:

# tree export/
export/
├── content_access_certificates
│   └── 6796462098343571066.pem
├── entitlement_certificates
│   ├── 1531295647393732686.pem
│   ├── 1747164807728364228.pem
│   └── 6221824843115472774.pem
└── meta.json

The certificate package above contains trial licenses for OpenShift as well as for RHEL8 (Try Red Hat Enterprise Linux Server) . The required license needed here for the proper entitlement of the NVIDIA GPU operator is the one for OpenShift - 60 Day Product Trial of Red Hat OpenShift Container Platform, Self-Supported (32 Cores).

You need to pick the correct pem file to configure a cluster-wide entitlement on your OpenShift cluster so that the GPU operator can access the required UBI packages to build the GPU driver images.

Hint #2: Should you pick the wrong pem file, for example, the one for a RHEL8 subscription or the one from the content_access_certificate then you may overcome the elfutils-libelf-devel.x86_64 issue as described in Hint #1 above but you may still lack access to certain repositories which are required for accessing the matching RHCOS kernel devel and header packages as shown in the error message below:

# oc get pods -o wide
NAME                                       READY   STATUS             RESTARTS   AGE   IP            NODE                      
nvidia-container-toolkit-daemonset-ldrbs   0/1     Init:0/1           0          13h   10.129.2.4    worker03.ocp4.scale.com
nvidia-container-toolkit-daemonset-vggj8   0/1     Init:0/1           0          13h   10.128.2.2    worker02.ocp4.scale.com
nvidia-container-toolkit-daemonset-vp459   0/1     Init:0/1           0          13h   10.131.0.24   worker01.ocp4.scale.com
nvidia-driver-daemonset-56lw7              0/1     CrashLoopBackOff   131        13h   10.129.2.3    worker03.ocp4.scale.com
nvidia-driver-daemonset-v9bnf              0/1     CrashLoopBackOff   133        13h   10.131.0.23   worker01.ocp4.scale.com
nvidia-driver-daemonset-zwx94              0/1     CrashLoopBackOff   133        13h   10.128.2.4    worker02.ocp4.scale.com

# oc logs nvidia-driver-daemonset-56lw7 | grep -i error
Error: No matching repo to modify: rhocp-4.5-for-rhel-8-x86_64-rpms.
Error: No matching repo to modify: rhel-8-for-x86_64-baseos-eus-rpms.
Error: Unable to find a match: kernel-headers-4.18.0-193.41.1.el8_2.x86_64 kernel-devel-4.18.0-193.41.1.el8_2.x86_64

In our example the OpenShift v4.5.30 cluster is running with RHCOS kernel version 4.18.0-193.41.1.el8_2.x86_64

# oc describe nodes | grep 'kernel-version.full'
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.41.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.41.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.41.1.el8_2.x86_64
                    feature.node.kubernetes.io/kernel-version.full=4.18.0-193.41.1.el8_2.x86_64

and the NVIDIA operator cannot access the required Red Hat repositories (rhocp-4.5-for-rhel-8-x86_64-rpms, rhel-8-for-x86_64-baseos-eus-rpms) to find the correct match for the required kernel header and devel packages.

You can use the rct cat-cert command on RHEL systems to ensure that you pick the right pem file by validating that it contains the required repositories like rhocp-4.5-for-rhel-8-x86_64-rpms and rhel-8-for-x86_64-baseos-eus-rpms:

# rct cat-cert export/entitlement_certificates/1747164807728364228.pem | grep rhel-8-for-x86_64-baseos-eus-rpms
        Label: rhel-8-for-x86_64-baseos-eus-rpms
# rct cat-cert export/entitlement_certificates/1747164807728364228.pem | grep rhocp-4.5-for-rhel-8-x86_64-rpms Label: rhocp-4.5-for-rhel-8-x86_64-rpms
# rct cat-cert export/entitlement_certificates/1747164807728364228.pem | grep SKU SKU: SER0419

We will pick the export/entitlement_certificates/1747164807728364228.pem file which is linked to the OpenShift subscription - here, the 60 Day Product Trial of Red Hat OpenShift Container Platform, Self-Supported (32 Cores) subscription (SER0419).

Step 2: Apply cluster-wide entitlement for Red Hat subscriptions

In our OpenShift environment with user-provisioned infrastructure and a bare metal OpenShift deployment we add the cluster-wide entitlement for the Red Hat subscriptions through a Kubernetes secret by following the first step in Installing GPU Operator via Helm from the official NVIDIA documentation.

First we obtain the machine config YAML template for cluster-wide enttitlements on OpenShift as described in How to use entitled image builds to build DriverContainers with UBI on OpenShift:

# wget https://raw.githubusercontent.com/openshift-psap/blog-artifacts/master/how-to-use-entitled-builds-with-ubi/0003-cluster-wide-machineconfigs.yaml.template

Then we copy the selected pem file from our entitlement certificate to a local file named nvidia.pem and add it to the downloaded machine config YAML template:

# cp export/entitlement_certificates/1747164807728364228.pem nvidia.pem
# sed  "s/BASE64_ENCODED_PEM_FILE/$(base64 -w0 nvidia.pem)/g" 0003-cluster-wide-machineconfigs.yaml.template > 0003-cluster-wide-machineconfigs.yaml

Now we are ready to apply the machine config to the OpenShift cluster.

IMPORTANT: Note, that this next step will trigger an update driven by the OpenShift machine config operator and initiate a restart on all worker nodes one by one::

# oc create -f 0003-cluster-wide-machineconfigs.yaml
machineconfig.machineconfiguration.openshift.io/50-rhsm-conf configured
machineconfig.machineconfiguration.openshift.io/50-entitlement-pem configured
machineconfig.machineconfiguration.openshift.io/50-entitlement-key-pem configured

# oc get machineconfig | grep entitlement
50-entitlement-key-pem                                                                                       2.2.0             4d1h
50-entitlement-pem                                                                                           2.2.0             4d1h

You can see the progress by issuing the following command to query the machine config pool (see the column UPDATING: True for the worker nodes, which means an update is in progress):

# oc get mcp 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-30627edc9bc847e2f6a7a9755561f65c   True      False      False      3              3                   3                     0                      24d
worker   rendered-worker-91786f8ba3ba5fc1f913c4d755db6e17   False     True       False      4              0                   0                     0                      24d

The worker nodes will be cordoned (i.e. set to SchedulingDisabled) and restarted one after another during the process:

# oc get nodes 
NAME                      STATUS                       ROLES    AGE   VERSION
master01.ocp4.scale.com   Ready                        master   24d   v1.18.3+65bd32d
master02.ocp4.scale.com   Ready                        master   24d   v1.18.3+65bd32d
master03.ocp4.scale.com   Ready                        master   24d   v1.18.3+65bd32d
worker01.ocp4.scale.com   Ready                        worker   24d   v1.18.3+65bd32d
worker02.ocp4.scale.com   NotReady,SchedulingDisabled  worker   24d   v1.18.3+65bd32d
worker03.ocp4.scale.com   Ready                        worker   24d   v1.18.3+65bd32d
worker04.ocp4.scale.com   Ready                        worker   24d   v1.18.3+65bd32d

When the update process has successfully completed all nodes will be available again for scheduling (STATUS: Ready) and you will see a status of UPDATED: True for the worker nodes in the machine config pool:

# oc get mcp 
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-30627edc9bc847e2f6a7a9755561f65c   True      False      False      3              3                   3                     0                      24d
worker   rendered-worker-8d10dd8f6569ba1505d831e95a6b7d6c   True      False      False      4              4                   4                     0                      24d

Now we can proceed with the actual installation of the

  • Node Feature Discovery (NFD) operator and the
  • NVIDIA GPU operator

from the OperatorHub catalog in the OpenShift web console following the NVIDIA documentention at Installing via OpenShift OperatorHub.

STEP 3: Install the Node Feature Discovery (NFD) operator

The Node Feature Discovery (NFD) operator is a prerequisite for the NVIDIA GPU operator and can be installed from the Red Hat OperatorHub catalog in the OpenShift web console. Be sure to select and install the non-community version (here shown on the right):

After you have installed the NDF operator on your OpenShift cluster you can select it from the Installed Operators panel and create an instance of the Node Feature Discovery operator by selecting the Create Instance link.

The NFD operator will deploy the following pods in the openshift-operators namespace:

# oc get pods -o wide -n openshift-operators
NAME                            READY   STATUS    RESTARTS   AGE     IP            NODE                    
nfd-master-5gwrx                1/1     Running   0          9m45s   10.128.0.80    master01.ocp4.scale.com
nfd-master-l8wv6                1/1     Running   0          9m45s   10.130.0.69    master03.ocp4.scale.com
nfd-master-x4t9w                1/1     Running   0          9m45s   10.129.0.64    master02.ocp4.scale.com
nfd-operator-9d469984-h52fr     1/1     Running   0          11m     10.129.0.63    master02.ocp4.scale.com
nfd-worker-9rtgw                1/1     Running   0          9m45s   10.10.1.18     worker04.ocp4.scale.com
nfd-worker-gl79k                1/1     Running   0          9m45s   10.10.1.17     worker03.ocp4.scale.com
nfd-worker-hz4t4                1/1     Running   0          9m45s   10.10.1.16     worker02.ocp4.scale.com
nfd-worker-zl87v                1/1     Running   0          9m45s   10.10.1.15     worker01.ocp4.scale.com

Once the NFD operator has successfully labeled all the worker nodes you will see the following PCI label with the NVIDIA vendor ID (10de) added to your nodes for NVIDIA GPUs:

# oc get nodes -l feature.node.kubernetes.io/pci-10de.present=true
NAME                      STATUS   ROLES    AGE   VERSION
worker01.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d
worker02.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d
worker03.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d

STEP 4: Install the NVIDIA GPU operator

With the proper Red Hat OpenShift entitlement in place and the NFD operator installed we can continue with the final step and install the NVIDIA GPU operator.

First we create a new project or namespace for the NVIDIA GPU operator named gpu-operator-resources (this is the default name that is being used) where the NVIDIA GPU operator components will be installed once an instance of the GPU operator is deployed. You can create the namespace either with "oc new-project gpu-operator-resources" or by using the web console and selecting Home -> Projects -> Create project:



The NVIDIA GPU operator will be installed from the Red Hat OperatorHub catalog in the OpenShift web console. Select and install the regular version (here shown on the right):



After you have installed the NVIDIA GPU operator in addition to the already deployed NFD operator you can select the NVIDIA GPU operator from the Installed Operators panel

and create an instance (here also called ClusterPolicy) by selecting the Create Instance link:

You can leave the customization form for the Create ClusterPolicy panel as is and simply click Create:

The NVIDIA GPU operator will now be deployed in the gpu-operator-resources namespace and install all the required components to configure supported NVIDIA GPUs on the nodes in the OpenShift cluster. This may take a while so be patient and wait at least 10-20 minutes before digging deeper into any form of troubleshooting.

The status of the newly deployed ClusterPolicy gpu-cluster-policy for the NVIDIA GPU operator will change to State:ready once the installation succeeded:

You will see an nvidia-driver-daemonset pod on each worker node that contains a supported NVIDIA GPU in the gpu-operator resources namespace:

# oc get pods -o wide
NAME                                       READY   STATUS      RESTARTS   AGE   IP             NODE
gpu-feature-discovery-7b2mg                1/1     Running     0          25m   10.131.0.111   worker01.ocp4.scale.com
gpu-feature-discovery-gkjth                1/1     Running     0          25m   10.128.2.55    worker02.ocp4.scale.com
gpu-feature-discovery-wj6vf                1/1     Running     0          25m   10.129.2.36    worker03.ocp4.scale.com
nvidia-container-toolkit-daemonset-dmprl   1/1     Running     0          29m   10.129.2.33    worker03.ocp4.scale.com
nvidia-container-toolkit-daemonset-krlq6   1/1     Running     0          29m   10.128.2.50    worker02.ocp4.scale.com
nvidia-container-toolkit-daemonset-lg6jq   1/1     Running     0          29m   10.131.0.108   worker01.ocp4.scale.com
nvidia-dcgm-exporter-b6qjt                 1/1     Running     0          25m   10.129.2.35    worker03.ocp4.scale.com
nvidia-dcgm-exporter-fk2dn                 1/1     Running     0          25m   10.131.0.110   worker01.ocp4.scale.com
nvidia-dcgm-exporter-v2bt4                 1/1     Running     0          25m   10.128.2.53    worker02.ocp4.scale.com
nvidia-device-plugin-daemonset-dg5cc       1/1     Running     0          26m   10.131.0.109   worker01.ocp4.scale.com
nvidia-device-plugin-daemonset-v9r96       1/1     Running     0          26m   10.129.2.34    worker03.ocp4.scale.com
nvidia-device-plugin-daemonset-xwk7z       1/1     Running     0          26m   10.128.2.51    worker02.ocp4.scale.com
nvidia-device-plugin-validation            0/1     Completed   0          25m   10.128.2.52    worker02.ocp4.scale.com
nvidia-driver-daemonset-8fg8r              1/1     Running     0          29m   10.131.0.107   worker01.ocp4.scale.com
nvidia-driver-daemonset-rr8k7              1/1     Running     0          29m   10.128.2.49    worker02.ocp4.scale.com
nvidia-driver-daemonset-snh27              1/1     Running     0          29m   10.129.2.32    worker03.ocp4.scale.com

You can run the nvidia-smi command in the nvidia-driver-daemonset pods to gather more information about the particular GPUs on the respective worker nodes:

# oc get pods | grep nvidia-driver-daemonset | while read a b; do echo "## $a ##"; oc exec $a -- nvidia-smi ; done
## nvidia-driver-daemonset-8fg8r ##
Mon May 10 17:25:54 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  On   | 00000000:06:00.0 Off |                  N/A |
| 29%   23C    P8    N/A /  75W |      1MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
## nvidia-driver-daemonset-rr8k7 ##
Mon May 10 17:25:54 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce GTX 105...  On   | 00000000:06:00.0 Off |                  N/A |
| 29%   24C    P8    N/A /  75W |      1MiB /  4040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
## nvidia-driver-daemonset-snh27 ##
Mon May 10 17:25:54 2021       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 460.32.03    Driver Version: 460.32.03    CUDA Version: 11.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla T4            On   | 00000000:06:00.0 Off |                    0 |
| N/A   38C    P8    15W /  70W |      0MiB / 15109MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

The following nvidia.com/gpu.present=true node labels are attached to each worker node that has a supported NVIDIA GPU:

# oc get nodes -l nvidia.com/gpu.present=true 
NAME                      STATUS   ROLES    AGE   VERSION
worker01.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d
worker02.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d
worker03.ocp4.scale.com   Ready    worker   28d   v1.18.3+65bd32d

In this example we have the following GPUs installed in the worker nodes:

  • Node: worker01.ocp4.scale.com: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
  • Node: worker02.ocp4.scale.com: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
  • Node: worker03.ocp4.scale.com: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)

as you can see from the output of the following command:

# oc get nodes -l node-role.kubernetes.io/worker --no-headers | while read a b; do echo "## Node: $a ##"; ( oc debug node/$a -- chroot /host lspci|grep NVIDIA ) 2>/dev/null ; done 
## Node: worker01.ocp4.scale.com ##
06:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
## Node: worker02.ocp4.scale.com ##
06:00.0 VGA compatible controller: NVIDIA Corporation GP107 [GeForce GTX 1050 Ti] (rev a1)
06:00.1 Audio device: NVIDIA Corporation GP107GL High Definition Audio Controller (rev a1)
## Node: worker03.ocp4.scale.com ##
06:00.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)
## Node: worker04.ocp4.scale.com ##
(-- no GPU --)

You are now able to run GPU-accelerated workloads on OpenShift, for example, with IBM Cloud Pak for Data and Watson Machine Learning Accelerator.

Should you encounter any issues with the NVIDIA GPU operator, please check the logs of the pods in the gpu-operator resources namespace (oc logs [pod]) and consult the Troubleshooting section in the NVIDIA documentation .


#Highlights-home
#Highlights
0 comments
544 views

Permalink