Cloud Pak for Data

 View Only

How to enable GPU Operator on OCP4.5 - Series 1: Preparing certificate for applying entitlement in the OCP cluster

By Hong Wei Jia posted Mon January 18, 2021 05:10 AM

  

How to enable GPU Operator on OCP4.5 - Series 1: Preparing certificate for applying entitlement in the OCP cluster

                                                             Authors:

                                                             bjhwjia@cn.ibm.com

                                                             huangdk@cn.ibm.com


It's recommended to leverage GPU Operator for automate the management of all NVIDIA software components needed to provision GPU on OCP 4.5. These components include the NVIDIA drivers (to enable CUDA), Kubernetes device plugin for GPUs, the NVIDIA Container Runtime, automatic node labelling, DCGM based monitoring and others.

There are basically two use scenarios for installing GPU Operator on OCP 4.5 as follows.

  • Install GPU Operator in internet connected environment.
  • Install GPU Operator in air-gapped environment.
As there are differences between the above two scenarios and some preparation jobs required for the implementation,  I'll deliver the GPU Operator enablement content in series.

  • Series 1 - Preparing certificate for applying entitlement in the OCP 4.5 cluster
  • Series 2 - Enable GPU Operator on OCP 4.5 in internet connected environment
  • Series 3 - Considerations for enabling GPU Operator on OCP 4.5 in air-gapped environment

In this article, we focus on Series 1: Preparing certificate for applying entitlement in the OCP cluster.

Prequisites - RedHat entitlement
Cloud Pak for Data customer is entitled a Red Hat account used for accessing the Red Hat Customer Portal. This account is required to download the certificate which is a must for installing GPU Operator.

Preparing certificate for applying entitlement in the OCP cluster

1.Set up a RHEL 8 machine with internet access and register with the Red Hat account

This is for generating a certificate later required by GPU operator to download packages(e.g. kernel-devel-4.18-xxxx package) used to build the driver container. 

Note: If your Bastion node is RHEL 8 with internet access, then you can do it on your bastion node directly. Otherwise, you may have to set up a new RHEL 8 machine.

[root@jhwubi81 ~]# subscription-manager register --username=bjhwjia@cn.ibm.com --password='xxxxxxxxx'
[root@jhwubi81 ~]# subscription-manager refresh
[root@jhwubi81 ~]# subscription-manager list --available | grep 'Red Hat OpenShift Container Platform for IBM Cloud Pak for Data' -A 72 -B 72


The output will be like below:

+-------------------------------------------+
    Available Subscriptions
+-------------------------------------------+
Subscription Name:   Red Hat OpenShift Container Platform for IBM Cloud Pak for Data (x Core, Business Partner Supported)
Provides:             Red Hat Developer Tools Beta (for RHEL Server for System Z)
                     …………………………………………………………                    
                    Red Hat OpenShift Container Platform
 
                    Red Hat OpenShift Container Platform for IBM Z and LinuxONE
                     Oracle Java (for RHEL Server) 
                    Red Hat Enterprise Linux for Power 9
                    Red Hat Enterprise Linux for x86_64 
                   …………………………………………………………
                   Red Hat CodeReady Linux Builder for x86_64
                   Red Hat OpenShift Container Platform for Power 
                   Red Hat CoreOS
                   Red Hat Enterprise Linux for Power, little endian Beta
                   Red Hat Openshift Serverless
                   Red Hat Enterprise Linux Atomic Host 
                   Red Hat OpenShift Pipelines for IBM Z and LinuxONE
 
                   Red Hat Software Collections Beta (for RHEL Server)
                 
SKU:                 MWXXXX
                 
Contract:            XXXXXX
                 
Pool ID:             8a85f99975f75bf20xxxxxxx07065652
                 
Provides Management: No
                 
Available:           30
                 
Service Type:        L3
                
Service Level:       Self-Support
                
Subscription Type:   Stackable
                
Entitlement Type:    Physical

[root@jhwubi81 ~]# subscription-manager attach --pool=8a85f99975f75bf20xxxxxxx07065652

2.Download subscription certificate from Red Hat Customer Portal

Log in the Red Hat Customer Portal with the following URL.

https://access.redhat.com/

red hat portal login

Download the subscription certificate from the Systems tab.

1)Go to My Subscriptions and click the icon



2)Click the Systems tab and find out the machine used for the registration in the step 1 (Set up a RHEL 8 machine with internet access and register with the Red Hat account).



In this example, it is jhwbui81.fyre.ibm.com. Click it and go to the next page about the subscriptions of this system.

3)Navigate to the Subscriptions tab and download the certificate.



Click the Download Certificates and start the download of the certificate. The downloaded certificate will be in zip format. Extract the zip file and then you will find out the certificate in pem format.

Ship the certificate to the Bastion node where we want to build the entitled container image.

E:\>scp 4613003528435112196.pem root@192.168.168.7:/ibm/gpu-helm/nvidia.pem
root@192.168.168.7's password:
4613003528435112196.pem                                                               100%.                    41KB 331.3KB/s   00:00
E:\>

3.Register the Bastion node with the same Red Hat account in Step 1 (Optional)

If the Step 1 is done on your Bastion node, then you will not need to do this step.

Just follow all the procedures in Step 1 for the registration.

Note:

You need to attach the same pool id used in Step 1.

4.Install helm3 in Bastion node

[root@bastion01 gpu-helm]# curl -L https://mirror.openshift.com/pub/openshift-v4/clients/helm/latest/helm-linux-amd64 -o /usr/local/bin/helm
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current   Dload  Upload   Total   Spent    Left  Speed
100 38.6M  100 38.6M    0     0   301k      0  0:02:11  0:02:11 --:--:--  238k

[root@bastion01 gpu-helm]# chmod +x /usr/local/bin/helm

[root@bastion01 gpu-helm]# helm version
version.BuildInfo{Version:"v3.3.4+5.el8", GitCommit:"1e63a4770a20072ed9f574013c01cc6e59881e48", GitTreeState:"clean", GoVersion:"go1.13.15"}

[root@bastion01 gpu-helm]#

5.Validate the certificate or entitlement

[root@bastion01 gpu-helm]# podman run -ti --mount type=bind,source=/ibm/gpu-helm/nvidia.pem,target=/etc/pki/entitlement/entitlement.pem  --mount type=bind,source=/ibm/gpu-helm/nvidia.pem,target=/etc/pki/entitlement/entitlement-key.pem registry.access.redhat.com/ubi8:latest bash -c "dnf search kernel-devel --showduplicates "

If you see the output like below, it means the certificate or entitilement is available.

//Output
Updating Subscription Management repositories.
Unable to read consumer identity
Subscription Manager is operating in container mode.
Red Hat Enterprise Linux 8 for x86_64 - AppStre  15 MB/s |  14 MB     00:00    
Red Hat Enterprise Linux 8 for x86_64 - BaseOS   15 MB/s |  13 MB     00:00    
Red Hat Universal Base Image 8 (RPMs) - BaseOS  493 kB/s | 760 kB     00:01    
Red Hat Universal Base Image 8 (RPMs) - AppStre 2.0 MB/s | 3.1 MB     00:01    
Red Hat Universal Base Image 8 (RPMs) - CodeRea  12 kB/s | 9.1 kB     00:00    
====================== Name Exactly Matched: kernel-devel ======================
kernel-devel-4.18.0-80.1.2.el8_0.x86_64 : Development package for building
                                        : kernel modules to match the kernel
kernel-devel-4.18.0-80.el8.x86_64 : Development package for building kernel
                                  : modules to match the kernel
kernel-devel-4.18.0-80.4.2.el8_0.x86_64 : Development package for building
                                        : kernel modules to match the kernel
kernel-devel-4.18.0-80.7.1.el8_0.x86_64 : Development package for building
                                        : kernel modules to match the kernel
kernel-devel-4.18.0-80.11.1.el8_0.x86_64 : Development package for building
                                         : kernel modules to match the kernel
kernel-devel-4.18.0-147.el8.x86_64 : Development package for building kernel
                                   : modules to match the kernel
kernel-devel-4.18.0-80.11.2.el8_0.x86_64 : Development package for building
                                         : kernel modules to match the kernel
kernel-devel-4.18.0-80.7.2.el8_0.x86_64 : Development package for building
                                        : kernel modules to match the kernel
kernel-devel-4.18.0-147.0.3.el8_1.x86_64 : Development package for building
                                         : kernel modules to match the kernel
kernel-devel-4.18.0-147.0.2.el8_1.x86_64 : Development package for building
                                         : kernel modules to match the kernel
kernel-devel-4.18.0-147.3.1.el8_1.x86_64 : Development package for building
                                         : kernel modules to match the kernel


With the entitlement validated, then we can move on to Series 2 - Enable GPU Operator.




#CloudPakforDataGroup
#Highlights
#Highlights-home
0 comments
512 views

Permalink