How to enable GPU Operator on OCP4.5 - Series 1: Preparing certificate for applying entitlement in the OCP cluster
Authors:
bjhwjia@cn.ibm.com
huangdk@cn.ibm.com
It's recommended to leverage GPU Operator for automate the management of all NVIDIA software components needed to provision GPU on OCP 4.5. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.
There are basically two use scenarios for installing GPU Operator on OCP 4.5 as follows.
- Install GPU Operator in internet connected environment.
- Install GPU Operator in air-gapped environment.
As there are differences between the above two scenarios and some preparation jobs required for the implementation, I'll deliver the GPU Operator enablement content in series.
- Series 1 - Preparing certificate for applying entitlement in the OCP 4.5 cluster
- Series 2 - Enable GPU Operator on OCP 4.5 in internet connected environment
- Series 3 - Considerations for enabling GPU Operator on OCP 4.5 in air-gapped environment
In this article, we focus on Series 1: Preparing certificate for applying entitlement in the OCP cluster.
Prequisites - RedHat entitlement
Cloud Pak for Data customer is entitled a Red Hat account used for accessing the Red Hat Customer Portal. This account is required to download the certificate which is a must for installing GPU Operator.
Preparing certificate for applying entitlement in the OCP cluster
1.Set up a RHEL 8 machine with internet access and register with the Red Hat account
This is for generating a certificate later required by GPU operator to download packages(e.g. kernel-devel-4.18-xxxx package) used to build the driver container.
Note: If your Bastion node is RHEL 8 with internet access, then you can do it on your bastion node directly. Otherwise, you may have to set up a new RHEL 8 machine.
[root@jhwubi81 ~]# subscription-manager register --username=bjhwjia@cn.ibm.com --password='xxxxxxxxx'
[root@jhwubi81 ~]# subscription-manager refresh
[root@jhwubi81 ~]# subscription-manager list --available | grep 'Red Hat OpenShift Container Platform for IBM Cloud Pak for Data' -A 72 -B 72
The output will be like below:
+-------------------------------------------+
Available Subscriptions
+-------------------------------------------+
Subscription Name: Red Hat OpenShift Container Platform for IBM Cloud Pak for Data (x Core, Business Partner Supported)
Provides: Red Hat Developer Tools Beta (for RHEL Server for System Z)
…………………………………………………………
Red Hat OpenShift Container Platform
Red Hat OpenShift Container Platform for IBM Z and LinuxONE
Oracle Java (for RHEL Server)
Red Hat Enterprise Linux for Power 9
Red Hat Enterprise Linux for x86_64
…………………………………………………………
Red Hat CodeReady Linux Builder for x86_64
Red Hat OpenShift Container Platform for Power
Red Hat CoreOS
Red Hat Enterprise Linux for Power, little endian Beta
Red Hat Openshift Serverless
Red Hat Enterprise Linux Atomic Host
Red Hat OpenShift Pipelines for IBM Z and LinuxONE
Red Hat Software Collections Beta (for RHEL Server)
SKU: MWXXXX
Contract: XXXXXX
Pool ID: 8a85f99975f75bf20xxxxxxx07065652
Provides Management: No
Available: 30
Service Type: L3
Service Level: Self-Support
Subscription Type: Stackable
Entitlement Type: Physical
[root@jhwubi81 ~]# subscription-manager attach --pool=8a85f99975f75bf20xxxxxxx07065652
2.Download subscription certificate from Red Hat Customer Portal
Log in the Red Hat Customer Portal with the following URL.
https://access.redhat.com/
Download the subscription certificate from the Systems tab.
1)Go to My Subscriptions and click the icon
In this example, it is jhwbui81.fyre.ibm.com. Click it and go to the next page about the subscriptions of this system.
3)Navigate to the Subscriptions tab and download the certificate.
3.Register the Bastion node with the same Red Hat account in Step 1 (Optional)
If the Step 1 is done on your Bastion node, then you will not need to do this step.
Just follow all the procedures in Step 1 for the registration.
Note:
You need to attach the same pool id used in Step 1.
4.Install helm3 in Bastion node
[root@bastion01 gpu-helm]# curl -L https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64 -o /usr/local/bin/helm
% Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed
100 38.6M 100 38.6M 0 0 301k 0 0:02:11 0:02:11 --:--:-- 238k
[root@bastion01 gpu-helm]# chmod +x /usr/local/bin/helm
[root@bastion01 gpu-helm]# helm version
version.BuildInfo{Version:"v3.3.4+5.el8", GitCommit:"1e63a4770a20072ed9f574013c01cc6e59881e48", GitTreeState:"clean", GoVersion:"go1.13.15"}
[root@bastion01 gpu-helm]#
5.Validate the certificate or entitlement
[root@bastion01 gpu-helm]# podman run -ti --mount type=bind,source=/ibm/gpu-helm/nvidia.pem,target=/etc/pki/entitlement/entitlement.pem --mount type=bind,source=/ibm/gpu-helm/nvidia.pem,target=/etc/pki/entitlement/entitlement-key.pem registry.access.redhat.com/ubi8:latest bash -c "dnf search kernel-devel --showduplicates "
If you see the output like below, it means the certificate or entitilement is available.
//Output
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - AppStre 15 MB/s | 14 MB 00:00
Red Hat Enterprise Linux 8 for x86_64 - BaseOS 15 MB/s | 13 MB 00:00
Red Hat Universal Base Image 8 (RPMs) - BaseOS 493 kB/s | 760 kB 00:01
Red Hat Universal Base Image 8 (RPMs) - AppStre 2.0 MB/s | 3.1 MB 00:01
Red Hat Universal Base Image 8 (RPMs) - CodeRea 12 kB/s | 9.1 kB 00:00
====================== Name Exactly Matched: kernel-devel ======================
kernel-devel-4.18.0-80.1.2.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-80.el8.x86_64 : Development package for building kernel
: modules to match the kernel
kernel-devel-4.18.0-80.4.2.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-80.7.1.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-80.11.1.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-147.el8.x86_64 : Development package for building kernel
: modules to match the kernel
kernel-devel-4.18.0-80.11.2.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-80.7.2.el8_0.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-147.0.3.el8_1.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-147.0.2.el8_1.x86_64 : Development package for building
: kernel modules to match the kernel
kernel-devel-4.18.0-147.3.1.el8_1.x86_64 : Development package for building
: kernel modules to match the kernel
With the entitlement validated, then we can move on to Series 2 - Enable GPU Operator.
#CloudPakforDataGroup#Highlights#Highlights-home