Co-authored by Elvin Galarza and Hidematsu Sueki
At IBM, we’re dedicated to offering state-of-the-art technology to our customers as the world continues to evolve. That’s why we’re excited to expand our GX3 family with the Intel Gaudi 3 AI accelerator.
Recently, we made Intel Gaudi 3 generally available for Red Hat OpenShift on IBM Cloud (ROKS) clusters running on IBM Cloud VPC. This AI accelerator is built for training and inference, boasting 64 Tensor Processing Cores (5th Gen TPCs) for a wide array of deep learning workloads and 8 matrix multiplication engines (MMEs) for high-performance matrix math, crucial for AI. Intel Gaudi 3 leans on its 128 GB of HBM2e memory - providing up to 3.7 TB/s in HBM bandwidth for R/W operations - and 96 MB with 12.8TB/s bandwidth on-die SRAM for fast data access, essentially boasting increased memory designed for LLM efficiency and cost-effectiveness. Featuring 24×200 Gbps RoCE v2 ports, it provides 9.6 Tbps of bi-directional networking capacity, enabling large-scale, flexible scaling.
New GX3 flavor now available
The following Intel Gaudi 3 AI accelerator flavor is available for Red Hat OpenShift on IBM Cloud VPC cluster version 4.18 with Red Hat CoreOS (RHCOS):
|
|
|
# of Accelerators (Intel Gaudi 3)
|
|
|
|
|
|
|
|
|
For more information of the new GX3 flavor, including regional availability and secondary storage options, see VPC Flavors for Red Hat OpenShift on IBM Cloud.
Getting started with GX3 flavors on Red Hat OpenShift on IBM Cloud
Once approved for access, with Red Hat OpenShift on IBM Cloud, installing the automates the management of all the necessary Intel software components. Once complete, provision a new cluster at 4.18 or later with the GX3 worker nodes. If you already have a 4.18+ cluster, simply add a worker pool that uses the GX3 nodes to your existing cluster. For more information, see Deploying an app on a GPU machine for Red Hat OpenShift on IBM Cloud.
• Access to this flavor can be obtained here
• Create 4.18 cluster and/or create Intel Gaudi 3 RHCOS worker pool
• Install Intel Gaudi Base Operator v1.20.1 from OpenShift certified operators catalog
• Follow Intel Gaudi documentation to install Intel Gaudi software (driver, runtime, device plugin)
• Follow documentation to deploy workloads that utilize Intel Gaudi 3 accelerator
Additional Resources
• Intel Gaudi documentation (latest): Link
• Intel Gaudi documentation (v1.20.1): Link
• Intel Gaudi OpenShift installation guide: Link
• Intel Gaudi Kubernetes quick start guide: Link
• Red Hat OpenShift Catalog: Intel Gaudi Base Operator: Link
#community-stories1