Cloud Platform as a Service

 View Only

Introducing Kubernetes and OpenShift clusters on IBM Cloud with NVIDIA H100 GPUs

By Elvin Galarza posted Mon September 16, 2024 10:34 AM

  


Co-authored with
Hidematsu Sueki

At IBM, we’re dedicated to offering state-of-the-art technology to organizations. That’s why we’re excited to expand our GX3 family with another GPU accelerated by the NVIDIA Hopper architecture

IBM is thrilled to announce that the NVIDIA H100 Tensor Core GPU is generally available for IBM Cloud Kubernetes Service (IKS) and Red Hat OpenShift on IBM Cloud (ROKS) clusters running on IBM Cloud VPC.

As artificial intelligence (AI) and machine learning (ML) models continue to grow and meet business requirements worldwide, so, too, do the requirements for training and delivering those models. The NVIDIA H100 GPU is readily equipped to meet that demand, accelerating the most demanding AI workloads and delivering unprecedented performance and efficiency. They inherit many design principles from NVIDIA A100 Tensor Core GPUs, with a focus on improved architectural efficiency and scaling. Designed for massive scale, the H100 enables organizations to train and deploy the largest and most complex AI models, while boasting incredible performance. NVIDIA H100 GPUs are up to 6x faster chip-to-chip compared to A100. We’ve found that switching from A100 GPUs to H100 GPUs experience up to 30x speed improvements and up to 9x speed improvements in AI inferencing and AI training, respectively. 

Available GX3D (NVIDIA H100 GPU) flavors 

The following H100 GPU flavor is available for IBM Cloud VPC clusters that run on any version of Red Hat OpenShift for both RHEL and RHCOS operating systems. 

  • gx3d.160x1792.8h1008 GPU, 160 cores, 1.8 TB memory, 100GB primary storage, 8 7.7TB additional storage, 32 Gbps network speed 

Getting started with GX3D (NVIDIA H100 GPUs) on IBM Cloud Kubernetes Service

Enjoy a plug-and-play experience with IBM Cloud Kubernetes Service when provisioning a cluster. GPU drivers are automatically installed, and you can get started immediately by provisioning a new cluster at 1.31 or later with GX3D worker nodes. No additional configuration is required to set up the GPU. If you already have a 1.31+ cluster, simply add a worker pool that uses the GX3D nodes to your existing cluster. For more information, see Deploying an app on a GPU machine for IBM Cloud Kubernetes Service.

Getting started with GX3D (NVIDIA H100 GPU) on Red Hat OpenShift on IBM Cloud 

With Red Hat OpenShift on IBM Cloud, installing the NVIDIA GPU Operator automates the management of all the necessary NVIDIA software components. Once complete, provision a new cluster at 4.15 or later with the GX3D worker nodes. If you already have a 4.15+ cluster, simply add a worker pool that uses the GX3D nodes to your existing cluster. For more information, see Deploying an app on a GPU machine for Red Hat OpenShift on IBM Cloud. 

Furthermore, Red Hat OpenShift AI on GX3D worker nodes can be leveraged to rapidly develop, train, serve, and monitor machine learning models on-premise, in the public cloud, and at the edge. To learn more, see Installing Red Hat OpenShift AI. 

Source: NVIDIA H100 Tensor Core GPU


#Highlights
#Highlights-home

0 comments
34 views

Permalink