Cloud Platform as a Service

Cloud Platform as a Service

Join us to learn more from a community of collaborative experts and IBM Cloud product users to share advice and best practices with peers and stay up to date regarding product enhancements, regional user group meetings, webinars, how-to blogs, and other helpful materials.

 View Only

Introducing OpenShift clusters on IBM Cloud with AMD Instinct™ MI300X GPUs

By Elvin Galarza posted 8 hours ago

  

Co-authored by Elvin Galarza and Bruce Cong

At IBM, we’re dedicated to offering state-of-the-art technology to our customers as the world continues to evolve. That’s why we’re excited to expand our GX3 family with the AMD Instinct™ MI300X GPU.  

Recently, we made AMD MI300X GPUs generally available for Red Hat OpenShift on IBM Cloud clusters running on IBM Cloud VPC. OpenShift clusters provide users with a powerful platform to build, deploy, and manage containerized applications at scale.  They offer intelligent scheduling, self-healing, horizontal scaling, service discovery and load balancing, automated rollouts and rollbacks, and secret and configuration management for your apps. Combined with an intuitive user experience, built-in security and isolation, and advanced tools to secure, manage, and monitor your cluster workloads, you can rapidly deliver highly available and secure containerized apps in the public  cloud and/or hybrid cloud environment.

The AMD Instinct MI300X GPU is purpose-built for large-scale generative AI workloads, offering exceptional performance and total cost of ownership advantages. Featuring a massive 192GB of HBM3 memory, the highest in its class, it enables larger models to run entirely in memory, reducing the need for multi-GPU setups and cutting infrastructure costs. Powered by the AMD CDNA™ 3 architecture with 304 high-throughput compute units and advanced AI capabilities including optimized data types and media decoding, the MI300X delivers unmatched efficiency for training and inference at scale.

New GX3 flavor now available

The following AMD MI300X GPU flavor is available for Red Hat OpenShift on IBM Cloud VPC cluster version 4.18 with Red Hat CoreOS (RHCOS): 

Flavor name

vCPUs

Memory

# of Cards (AMD MI300X)

Storage Instance

Network

gx3d.208x1792.8mi300x

208

1.8TB

8

8 x 3.2 TB 

32 Gbps

For more information of the new GX3 flavor, including regional availability and secondary storage options, see VPC Flavors for Red Hat OpenShift on IBM Cloud. 

Getting started with AMD MI300X flavors on Red Hat OpenShift on IBM Cloud 

Once approved for access, provision a new cluster at 4.18 or later with the MI300X worker nodes. If you already have a 4.18+ cluster, simply add a worker pool that uses the MI300X nodes to your existing cluster. With Red Hat OpenShift on IBM Cloud, installing the necessary operators automates the management of all the necessary AMD software components. For more information, see Deploying an app on a GPU machine for Red Hat OpenShift on IBM Cloud.

• Access to this flavor can be obtained here

• Create 4.18 cluster and/or create MI300X RHCOS worker pool

• Install Kernel Module Management (KMM) Operator v2.40+ from OpenShift Red Hat operators catalog

• Install Node Feature Discovery (NFD) Operator from OpenShift Red Hat operator’s catalog

• Install Node Feature Discovery (NFD) Operator from OpenShift certified operator’s catalog

• Install AMD GPU Operator v1.2.1+ from OpenShift certified operators catalog

• Follow AMD documentation to install AMD software (driver, device plugin)

• Follow documentation to deploy workloads that utilize AMD MI300X GPUs.

 

Additional Resources

• AMD GPU Operator documentation (latest): Link

• AMD GPU Operator OpenShift installation guide: Link

• NFD Operator installation guide (latest): Link

• KMM Operator installation guide (latest): Link

• Red Hat OpenShift Catalog: AMD GPU Operator: Link

0 comments
6 views

Permalink