Cloud Platform as a Service

Cloud Platform as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

 View Only

Introducing OpenShift clusters on IBM Cloud with Intel® Gaudi® 3 AI Accelerator

By Elvin Galarza posted 21 days ago

  

Co-authored by Elvin Galarza and Hidematsu Sueki 

 

At IBM, we’re dedicated to offering state-of-the-art technology to our customers as the world continues to evolve. That’s why we’re excited to expand our GX3 family with the Intel Gaudi 3 AI accelerator. 

 

Recently, we made Intel Gaudi 3 generally available for Red Hat OpenShift on IBM Cloud (ROKS) clusters running on IBM Cloud VPC. This AI  accelerator is built for training and inference, boasting 64 Tensor Processing Cores (5th Gen TPCs) for a wide array of deep learning workloads and 8 matrix multiplication engines (MMEs) for high-performance matrix math, crucial for AI. Intel Gaudi 3 leans on its 128 GB of HBM2e memory - providing up to 3.7 TB/s in HBM bandwidth for R/W operations - and 96 MB with 12.8TB/s bandwidth on-die SRAM for fast data access, essentially boasting increased memory designed for LLM efficiency and cost-effectiveness. Featuring 24×200 Gbps RoCE v2 ports, it provides 9.6 Tbps of bi-directional networking capacity, enabling large-scale, flexible scaling. 

 

New GX3 flavor now available 

The following Intel Gaudi 3 AI accelerator flavor is available for Red Hat OpenShift on IBM Cloud VPC cluster version 4.18 with Red Hat CoreOS (RHCOS): 

 

Flavor name 
vCPUs 
Memory 
# of Accelerators (Intel Gaudi 3) 
Storage Instance 
Network 
gx3-160x1792x8gaudi3 
160 
1.8TB
8 
8 x 3.2 TB 
32 Gbps 

For more information of the new GX3 flavor, including regional availability and secondary storage options, see VPC Flavors for Red Hat OpenShift on IBM Cloud. 

 

Getting started with GX3 flavors on Red Hat OpenShift on IBM Cloud 

Once approved for access, with Red Hat OpenShift on IBM Cloud, installing the automates the management of all the necessary Intel software components. Once complete, provision a new cluster at 4.18 or later with the GX3 worker nodes. If you already have a 4.18+ cluster, simply add a worker pool that uses the GX3 nodes to your existing cluster. For more information, see Deploying an app on a GPU machine for Red Hat OpenShift on IBM Cloud. 

Access to this flavor can be obtained here 

Create 4.18 cluster and/or create Intel Gaudi 3 RHCOS worker pool 

Install Intel Gaudi Base Operator v1.20.1 from OpenShift certified operators catalog 

Follow Intel Gaudi documentation to install Intel Gaudi software (driver, runtime, device plugin) 

Follow documentation to deploy workloads that utilize Intel Gaudi 3 accelerator 

 

Additional Resources 

Intel Gaudi documentation (latest): Link 

Intel Gaudi documentation (v1.20.1): Link 

Intel Gaudi OpenShift installation guide: Link 

Intel Gaudi Kubernetes quick start guide: Link 

Red Hat OpenShift Catalog: Intel Gaudi Base Operator: Link 


#community-stories1
0 comments
16 views

Permalink