Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

View Only

Back to Blog List

Installing and Configuring the Spyre AI accelerator card and deploy sample AI model on IBM Z and IBM® LinuxONE: A Step-by-step deployment guide

By Sindhuja bd posted 3 days ago

Authors : Sindhuja BD (sindhujabd@ibm.com), Dilip B (Dilip.Bhagavan@ibm.com), Modassar Rana (modassar.rana@ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)

Introduction

This document provides a step-by-step guide to deploying an AI model on the Red Hat OpenShift AI platform, leveraging IBM Spyre cards on IBM Z or IBM® LinuxONE systems with the IBM Spyre Operator.

The IBM Spyre Operator for IBM Z and IBM® LinuxONE automates the setup and configuration of the necessary software stack to make the Spyre accelerators available to AI workloads running on Red Hat OpenShift.

At high-level, here is the overview of the workflow we will cover in this guide.

Mounting the Spyre cards to the OCP worker node.
Applying machine config to the worker node.
Installing the dependency operators.
Install and configure the IBM Spyre operator.
Install the Red Hat OpenShift AI operator.
Configure the Hardware profile for model deployment.
Deploy the model on Red Hat OpenShift AI.
Make Inference Request.

Prerequisites

A running OpenShift cluster, version 4.19.10 or later installed on an IBM Z or IBM® LinuxONE machine with Spyre cards.
A minimum of one Spyre card is required to proceed. The supported configurations allow mounting 1, 2 or 4 Spyre cards, with a maximum capacity of 4.
Cluster administrator privileges for your OpenShift cluster.

Verification

On KVM:

$ lspci

0000:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0001:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0002:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0003:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0004:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0005:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0006:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

0007:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02)

On ZVM:

$ /sbin/vmcp q pcif

0000:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02) 0001:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02) 0002:00:00.0 Processing accelerators: IBM Spyre Accelerator Virtual Function (rev 02) ...

Note

Maximum 4 cards is a constraint of vLLM if granite-3.3-8b-instruct model is being used along with built-in model cache

Spyre setup

Mounting of Spyre cards to the worker nodes on KVM:

Note: If you are on a ZVM machine, follow the next section to mount cards on ZVM.

Note: Available Spyre cards can be distributed across multiple worker nodes if you intend to run multiple workload pod replicas.

For simplicity, this guide demonstrates mounting the card(s) on a single worker node to deploy a single workload pod.

$ oc get no

NAME STATUS ROLES AGE VERSION

ocpz-standard-1-comts-0 Ready worker 22h v1.32.9

ocpz-standard-1-condi-0 Ready control-plane,master 22h v1.32.9

ocpz-standard-1-conlz-2 Ready control-plane,master 22h v1.32.9

ocpz-standard-1-connt-1 Ready control-plane,master 22h v1.32.9

Run the following command on the worker node you want to mount the cards on:

virsh edit <worker-node-name>

Add the following host dev entry code block after <audio id='1' type='none'/> . The values of domain can be obtained by running the command lspci output and the first four digits will be your domain. Each card must have its own hostdev entry

<hostdev mode='subsystem' type='pci' managed='yes'>
<source>
<address domain='<domain-value>' bus='0x00' slot='0x00' function='0x0'/>
</source>
</hostdev>

After adding the entry save and exit and run the following commands:

virsh destroy <worker-node-name>
virsh start <worker-node-name>

Ensure that the worker node is ready before proceeding.

Note:

You can also use the following script to attach the cards.

#!/bin/bash

set -e

# Usage: ./attach_device.sh <VM_NAME> pci_id1 pci_id2 pci_id3 ...

# Example: ./attach_device.sh worker1 0000:00:00.0 0001:00:00.0

VM_NAME=$1

shift

if [ -z "$VM_NAME" ] || [ $# -eq 0 ]; then

echo "Usage: $0 <VM_NAME> <PCI_ID1> [PCI_ID2 ...]"

exit 1

fi

PCI_IDS=("$@")

echo "[*] Attaching PCI devices to VM $VM_NAME: ${PCI_IDS[*]}"

# Fetch currently attached PCI devices for target VM

ATTACHED=$(virsh dumpxml "$VM_NAME" | grep -oP "domain='0x\K[0-9a-f]+(?=')" || true)

for PCI_ID in "${PCI_IDS[@]}"; do

DOMAIN=$(echo $PCI_ID | cut -d: -f1)

BUS=$(echo $PCI_ID | cut -d: -f2)

SLOT=$(echo $PCI_ID | cut -d: -f3 | cut -d. -f1)

FUNCTION=$(echo $PCI_ID | cut -d. -f2)

# Check if device is already attached to this VM

if echo "$ATTACHED" | grep -qi "$DOMAIN"; then

echo "[*] PCI device $PCI_ID is already attached to $VM_NAME. Skipping."

continue

fi

# Check if device is attached to any other VM

for OTHER_VM in $(virsh list --all --name | grep -v "^$VM_NAME$"); do

if virsh dumpxml "$OTHER_VM" | grep -qi "$DOMAIN:$BUS:$SLOT.$FUNCTION"; then

echo "[!] ERROR: PCI device $PCI_ID is already attached to VM $OTHER_VM. Aborting."

exit 1

fi

done

# Create temporary XML for this device

XML_FILE=$(mktemp /tmp/pci_XXXX.xml)

cat > "$XML_FILE" <<EOF

<hostdev mode='subsystem' type='pci' managed='yes'>

<source>

<address domain='0x$DOMAIN' bus='0x$BUS' slot='0x$SLOT' function='0x$FUNCTION'/>

</source>

</hostdev>

EOF

echo "[*] Attaching PCI device $PCI_ID to $VM_NAME..."

virsh attach-device "$VM_NAME" "$XML_FILE" --persistent

# Cleanup

rm -f "$XML_FILE"

done

echo "[+] All PCI devices processed successfully!"

virsh destroy "$VM_NAME"

virsh start "$VM_NAME"

Usage

./attach_device.sh <worker-node-name> <pci_id1> <pci_id2> <pci_id3> ...

Example

./attach_device.sh ocpz-standard-1-comts-0 0000:00:00.0 0001:00:00.0 0002:00:00.0 0003:00:00.0

Mounting of Spyre cards to the worker nodes on ZVM:

Identify the ZVM Guest name where you want to attach the PCI cards.
Check if the cards are present by running the query command /sbin/vmcp q pcif
Before attaching PCI cards, you need to set the IO_OPT UID to OFF for the target guest

vmcp SET IO_OPT UID OFF <guest-name>

Note: Wait a few seconds after this command before proceeding to the next step

Attach each PCI card to the ZVM guest using the following command format

vmcp att pcif <pci-id> to <guest-name>

Where:
• <pci-id> is the PCI function identifier (e.g., 302, 312, 322, etc.)
• <guest-name> is the name of your ZVM guest

Note:
• Attach cards one at a time
• Wait a few seconds between each attachment to ensure proper initialization
• Each card must be attached individually
• Run /sbin/vmcp q pcif (and lspci on the worker) after each attachment batch to confirm the
card is visible before proceeding

Verify the attachment by querying the PCI functions, after attaching all cards

/sbin/vmcp q pcif

Verify from Worker Node, SSH into the worker node and verify that the cards are visible using lspci
You should see output similar like

Each attached card should appear as an "IBM Spyre Accelerator Virtual Function" device

Apply machine config

Prerequisites

Before applying the machine config, label the worker node on which the cards are mounted with the spyre role to identify it as a Spyre-enabled node.

oc label node <worker-node-name> node-role.kubernetes.io/spyre=""

Verify that the label has been applied correctly. The node should show both spyre and worker in the ROLES column.

$ oc get no

NAME STATUS ROLES AGE VERSION

ocpz-standard-1-comts-0 Ready spyre,worker 22h v1.32.9

ocpz-standard-1-condi-0 Ready control-plane,master 22h v1.32.9

ocpz-standard-1-conlz-2 Ready control-plane,master 22h v1.32.9

ocpz-standard-1-connt-1 Ready control-plane,master 22h v1.32.9

Create a MachineConfigPool (MCP) specifically for Spyre nodes. This allows you to apply machine configurations only to nodes with the Spyre role.

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfigPool
metadata:
name: spyre
spec:
machineConfigSelector:
matchExpressions:
- key: machineconfiguration.openshift.io/role
operator: In
values:
- worker
- spyre
nodeSelector:
matchLabels:
node-role.kubernetes.io/spyre: ""

Apply the MachineConfigPool to your cluster

oc apply -f worker-mcp.yaml

You should see output confirming the creation

machineconfigpool.machineconfiguration.openshift.io/spyre created

Create and apply the MachineConfig files

Create a YAML file with following host user bind machine config contents and apply it as oc apply -f 99-vhostuser-bind.yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
machineconfiguration.openshift.io/role: spyre
kubernetes.io/arch: s390x
name: 99-vhostuser-bind
spec:
config:
ignition:
version: 3.4.0
systemd:
units:
- name: vhostuser-bind.service
enabled: true
contents: |
[Unit]
Description=Vhostuser Interface vfio-pci Bind
Wants=network-online.target
After=network-online.target ignition-firstboot-complete.service
ConditionPathExists=/etc/modprobe.d/vfio.conf
[Service]
Type=oneshot
TimeoutSec=900
ExecStart=/usr/local/bin/vhostuser
[Install]
WantedBy=multi-user.target
storage:
files:
- contents:
inline: vfio-pci
filesystem: root
mode: 0644
path: /etc/modules-load.d/vfio-pci.conf
- contents:
# This b64 string is an encoded shell script (bind-vfio.sh), run `make power-update-machineconfig`
# to re-encode if you need to update the script
source: data:text/plain;charset=utf-8;base64,IyEvYmluL2Jhc2gKc2V0IC1lCgpQQ0lfREVWSUNFUz0kKGxzcGNpIC1uIC1kIDEwMTQ6MDZhNyB8IGN1dCAtZCAiICIgLWYxIHwgcGFzdGUgLXNkICIsIiAtKQoKZWNobyAiJFBDSV9ERVZJQ0VTIgpJRlM9JywnIHJlYWQgLXJhIERFVklDRVMgPDw8ICIkUENJX0RFVklDRVMiCgpmb3IgVkZJT0RFVklDRSBpbiAiJHtERVZJQ0VTW0BdfSI7IGRvCiAgICBjZCAvc3lzL2J1cy9wY2kvZGV2aWNlcy8iJHtWRklPREVWSUNFfSIgfHwgY29udGludWUKCiAgICBpZiBbICEgLWYgImRyaXZlci91bmJpbmQiIF07IHRoZW4KICAgICAgICBlY2hvICJGaWxlIGRyaXZlci91bmJpbmQgbm90IGZvdW5kIGZvciAke1ZGSU9ERVZJQ0V9IgogICAgICAgIGV4aXQgMQogICAgZmkKCiAgICBpZiAhIGVjaG8gLW4gInZmaW8tcGNpIiA+IGRyaXZlcl9vdmVycmlkZTsgdGhlbgogICAgICAgIGVjaG8gIkNvdWxkIG5vdCB3cml0ZSB2ZmlvLXBjaSB0byBkcml2ZXJfb3ZlcnJpZGUiCiAgICAgICAgZXhpdCAxCiAgICBmaQoKICAgIGlmICEgWyAtZiBkcml2ZXIvdW5iaW5kIF0gJiYgZWNobyAtbiAiJFZGSU9ERVZJQ0UiID4gZHJpdmVyL3VuYmluZDsgdGhlbgogICAgICAgIGVjaG8gIkNvdWxkIG5vdCB3cml0ZSB0aGUgVkZJT0RFVklDRTogJHtWRklPREVWSUNFfSB0byBkcml2ZXIvdW5iaW5kIgogICAgICAgIGV4aXQgMQogICAgZmkKCmRvbmUK
filesystem: root
mode: 0744
path: /usr/local/bin/vhostuser

Create another YAML file with following worker-aiu-kernel machine config contents and apply it as oc apply -f 05-worker-aiu-kernel-vfiopci.yaml

apiVersion: machineconfiguration.openshift.io/v1
kind: MachineConfig
metadata:
labels:
kubernetes.io/arch: s390x
machineconfiguration.openshift.io/role: spyre
name: 05-worker-aiu-kernel-vfiopci
spec:
config:
ignition:
version: 3.4.0
storage:
files:
- contents:
compression: gzip
source: data:;base64,H4sIAAAAAAAC/2TMQQrDIBCF4X1O4QFaUBqMFDzLYOsUHqgjGRvw9oVCFyW7B+/nkz4gTc3xglz7EwZZo7NuvVuftstvheUUckuPwqQ75Iju7ydIrW8as7MzuSbiNvZJBRUj+s3ZEPx6FjP0SyIXpnyLk3X5BAAA//9a8ioOoAAAAA==
mode: 420
path: /etc/modprobe.d/vfio-pci.conf
- contents:
compression: ""
source: data:,vfio-pci%0Avfio_iommu_type1%0A
mode: 420
path: /etc/modules-load.d/vfio-pci.conf
- contents:
compression: ""
source: data:;base64,W2NyaW8ucnVudGltZV0KZGVmYXVsdF91bGltaXRzID0gWwogICJtZW1sb2NrPS0xOi0xIgpdCg==
mode: 420
path: /etc/crio/crio.conf.d/10-custom
- contents:
compression: ""
source: data:,SUBSYSTEM%3D%3D%22vfio%22%2C%20MODE%3D%220666%22%0A
mode: 420
path: /etc/udev/rules.d/90-vfio-3.rules
- contents:
compression: ""
source: data:,%40sentient%20-%20memlock%20134217728%0A
mode: 420
path: /etc/security/limits.d/memlock.conf

Apply SELinux Policy

Note: This step is required for IBM Spyre Operator version above 1.0.0. We install IBM Spyre Operator - 1.1.0 in this guide.

Create an YAML file with the following SELinux policy machine config contents and apply it as oc apply -f 50-spyre-device-plugin-selinux-minimal.yaml

apiVersion: machineconfiguration.openshift.io/v1

kind: MachineConfig

metadata:

labels:

machineconfiguration.openshift.io/role: spyre

name: 50-spyre-device-plugin-selinux-minimal

spec:

config:

ignition:

version: 3.2.0

storage:

files:

- contents:

          source: data:text/plain;charset=utf-8;base64,bW9kdWxlIHNweXJlX2RldmljZV9wbHVnaW5fbWluaW1hbCAxLjA7CgpyZXF1aXJlIHsKICAgIHR5cGUgY29udGFpbmVyX3Q7CiAgICB0eXBlIGNvbnRhaW5lcl9ydW50aW1lX3Q7CiAgICB0eXBlIGNvbnRhaW5lcl92YXJfcnVuX3Q7CiAgICBjbGFzcyB1bml4X3N0cmVhbV9zb2NrZXQgY29ubmVjdHRvOwogICAgY2xhc3Mgc29ja19maWxlIHdyaXRlOwp9CgojIEdyYW50IE9OTFkgdGhlIHNwZWNpZmljIHBlcm1pc3Npb25zIG5lZWRlZCBmb3IgQ1JJLU8gY29tbXVuaWNhdGlvbgphbGxvdyBjb250YWluZXJfdCBjb250YWluZXJfcnVudGltZV90OnVuaXhfc3RyZWFtX3NvY2tldCBjb25uZWN0dG87CmFsbG93IGNvbnRhaW5lcl90IGNvbnRhaW5lcl92YXJfcnVuX3Q6c29ja19maWxlIHdyaXRlOwo=

mode: 0644

path: /etc/selinux/spyre_device_plugin_minimal.te

systemd:

units:

- contents: |

[Unit]

Description=Install minimal SELinux policy for spyre device plugin

After=multi-user.target

[Service]

Type=oneshot

          ExecStartPre=/bin/bash -c 'if [ ! -f /etc/selinux/spyre_device_plugin_minimal.pp ]; then checkmodule -M -m -o /etc/selinux/spyre_device_plugin_minimal.mod /etc/selinux/spyre_device_plugin_minimal.te && semodule_package -o /etc/selinux/spyre_device_plugin_minimal.pp -m /etc/selinux/spyre_device_plugin_minimal.mod; fi'

ExecStart=/bin/bash -c 'semodule -i /etc/selinux/spyre_device_plugin_minimal.pp || true'

RemainAfterExit=true

[Install]

WantedBy=multi-user.target

enabled: true

name: install-spyre-selinux-minimal-policy.service

- contents: |

[Unit]

Description=Setup device plugin directories with permissions and SELinux context

After=network-online.target

Before=kubelet.service

[Service]

Type=oneshot

# Fix kubelet directory permissions for device plugin socket operations

ExecStart=/usr/bin/chmod 770 /var/lib/kubelet/plugins_registry

ExecStart=/usr/bin/chmod 770 /var/lib/kubelet/device-plugins

# delete device plugin directories before creating it

ExecStart=/usr/bin/rm -f /usr/local/etc/device-plugins/complete

ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/spyre-config

ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/spyre-metrics

ExecStart=/usr/bin/rm -rf /usr/local/etc/device-plugins/metadata

# Create device plugin directories

ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/spyre-config

ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/spyre-metrics

ExecStart=/usr/bin/mkdir -p /usr/local/etc/device-plugins/metadata

# Set permissions for group write access (device plugin runs as UID 1001, GID 0)

ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins

ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/spyre-config

ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/spyre-metrics

ExecStart=/usr/bin/chmod 770 /usr/local/etc/device-plugins/metadata

# Fix SELinux context for container access

ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins

ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/spyre-config

ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/spyre-metrics

ExecStart=/usr/bin/chcon -R -t container_file_t /usr/local/etc/device-plugins/metadata

RemainAfterExit=true

[Install]

WantedBy=multi-user.target

enabled: true

name: setup-device-plugin-directories.service

Once the machine config is applied, monitor the status using following command. Wait until the MachineConfigPool reflects the following status.

$ oc get mcp worker -w
NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE
worker rendered-worker-dd.. True False False 1 1 1 0 133m

Optional:

If you want to prevent workloads from being scheduled on Spyre nodes until configuration is complete, you can temporarily taint the worker node:

oc adm taint nodes <worker-node-name> ibm.com/spyre=:NoSchedule

Install Dependency Operators

Prerequisites

The following are dependency operators that need to be installed before installing IBM Spyre Operator

Node Feature Discovery (NFD) Operator is an OpenShift Operator that automates the detection of
hardware features and system configurations across cluster nodes.
Red Hat Cert Manager Operator is a cluster-wide service that automates application certificate
lifecycle management.
Secondary Scheduler Operator: allows to deploy a custom scheduler alongside the default OpenShift
scheduler.

Node Feature Discovery (NFD) Operator

Installation Steps:

Log in to the OpenShift web console as a cluster administrator.
In the left panel, navigate to Operators → OperatorHub.
On the OperatorHub page, type Node Feature Discovery into Filter by Keyword box.

Click on the Node Feature Discovery provided by Red Hat.
Select Stable from Channel drop down and choose latest version available from Version drop down.
Click Install. The Install Operator page opens.

Node feature discovery - installation pane

Choose A specific namespace on the cluster under Installation mode.
Choose Operator recommended Namespace under Installed Namespace which will be created for you.
Choose Automatic under Update approval.
Click Install.

Node feature discovery - installation pane-1

The Installing Operator pane appears. When the installation finishes, a checkmark appears next to the Operator name.

Verification

In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the Node Feature Discovery is in Succeeded state.

Node feature discovery - installation-successful

Configure the Node Feature Discovery Operator

Click on Node Feature Discovery Operator.
Navigate to Node Feature Discovery tab.
Select Create NodeFeatureDiscovery.

Provide the following YAML contents and then click Create at the bottom of the page.

apiVersion: nfd.openshift.io/v1 kind: NodeFeatureDiscovery metadata: name: nfd-instance namespace: openshift-nfd spec: extraLabelNs: - ibm.com instance: "" # instance is empty by default operand: image: 'registry.redhat.io/openshift4/ose-node-feature-discovery-rhel9:v4.16' imagePullPolicy: Always servicePort: 12000 workerConfig: configData: | #core: # labelWhiteList: # noPublish: false # sleepInterval: 60s # sources: [all] # klog: # addDirHeader: false # alsologtostderr: false # logBacktraceAt: # logtostderr: true # skipHeaders: false # stderrthreshold: 2 # v: 0 # vmodule: ## NOTE: the following options are not dynamically run-time configurable ## and require a nfd-worker restart to take effect after being changed # logDir: # logFile: # logFileMaxSize: 1800 # skipLogHeaders: false #sources: # cpu: # cpuid: ## NOTE: whitelist has priority over blacklist # attributeBlacklist: # - "BMI1" # - "BMI2" # - "CLMUL" # - "CMOV" # - "CX16" # - "ERMS" # - "F16C" # - "HTT" # - "LZCNT" # - "MMX" # - "MMXEXT" # - "NX" # - "POPCNT" # - "RDRAND" # - "RDSEED" # - "RDTSCP" # - "SGX" # - "SSE" # - "SSE2" # - "SSE3" # - "SSE4.1" # - "SSE4.2" # - "SSSE3" # attributeWhitelist: # kernel: # kconfigFile: "/path/to/kconfig" # configOpts: # - "NO_HZ" # - "X86" # - "DMI" # pci: # deviceClassWhitelist: # - "0200" # - "03" # - "12" # deviceLabelFields: # - "class" # - "vendor" # - "device" # - "subsystem_vendor" # - "subsystem_device" # usb: # deviceClassWhitelist: # - "0e" # - "ef" # - "fe" # - "ff" # deviceLabelFields: # - "class" # - "vendor" # - "device" # custom: # - name: "my.kernel.feature" # matchOn: # - loadedKMod: ["example_kmod1", "example_kmod2"] # - name: "my.pci.feature" # matchOn: # - pciId: # class: ["0200"] # vendor: ["15b3"] # device: ["1014", "1017"] # - pciId : # vendor: ["8086"] # device: ["1000", "1100"] # - name: "my.usb.feature" # matchOn: # - usbId: # class: ["ff"] # vendor: ["03e7"] # device: ["2485"] # - usbId: # class: ["fe"] # vendor: ["1a6e"] # device: ["089a"] # - name: "my.combined.feature" # matchOn: # - pciId: # vendor: ["15b3"] # device: ["1014", "1017"] # loadedKMod : ["vendor_kmod1", "vendor_kmod2"]

Red Hat Cert Manager Operator

Installation Steps:

Log in to the OpenShift web console as a cluster administrator.
In the left panel, navigate to Operators → OperatorHub.
On the OperatorHub page, type cert-manager Operator for Red Hat OpenShift into Filter by Keyword box.

Click on cert-manager Operator for Red Hat OpenShift (not community version).
Select stable-v1 from Channel drop down and choose latest version available from Version drop down.

Click Install. The Install Operator page opens.
Select A specific namespace on the cluster under Installation mode.
Select Operator Recommended Namespace under Installed Namespace.
Choose Automatic under Update approval.
Click Install.

The Installing Operator pane appears. When the installation finishes, a checkmark appears next to the Operator name.

cert-manager-OperatorHub-installation-succeeded

Verification

In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the cert-manager Operator for Red Hat OpenShift is in Succeeded state.
Verify if the pods under cert-manager are in running state.

Secondary Scheduler Operator

Installation Steps:

Log in to the OpenShift web console as a cluster administrator.
In the left panel, navigate to Operators → OperatorHub.
On the OperatorHub page, type Secondary Scheduler Operator for Red Hat OpenShift into Filter by Keyword box.

secondary scheduler operator - Operator Hub

Click on Secondary Scheduler Operator for Red Hat OpenShift.
Select stable from Channel drop down and choose latest version available from Version drop down.

secondary scheduler operator - installation-1

Click Install. The Install Operator page opens.
Select A specific namespace on the cluster under Installation mode.
Select openshift-secondary-scheduler-operator from Installed Namespace drop down.
Choose Automatic under Update approval.
Click Install.

secondary scheduler operator - installation-2

The Installing Operator pane appears. When the installation finishes, a checkmark appears next to the Operator name.

Verification

In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the Secondary scheduler Operator for Red Hat OpenShift is in Succeeded state.

Install IBM Spyre Operator

Log in to the OpenShift web console as a cluster administrator.
In the left panel, navigate to Operators → OperatorHub.
On the OperatorHub page, type IBM Spyre Operator into Filter by Keyword box.

Click the IBM Spyre Operator tile. The IBM Spyre Operator information pane opens.
Select stable from channel dropdown.
Select 1.1.0 from version dropdown.

Click Install. The Install Operator page opens.

Select All namespaces on the cluster under Installation mode.
Select Operator recommended Namespace from Installed Namespace drop down.
Choose Automatic under Update approval.
Click Install.
The Installing Operator pane appears. When the installation finishes, a checkmark appears next to the Operator name.

Verification

In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the IBM Spyre operator shows one of the following statuses:
- Installing: Installation is in progress; wait for this to change to Succeeded. This might take several minutes.
- Succeeded: Installation is successful.
Navigate to Workloads → Pods from side panel of OpenShift web console and verify if pods under spyre-operator namespace are in running state.

Create Spyre Cluster Policy

Once the Operator is in Succeeded state, click on IBM Spyre Operator and navigate to Spyre Cluster Policy tab.

Click on Create SpyreClusterPolicy option and paste the follwing YAML contents under yaml view.

apiVersion: spyre.ibm.com/v1alpha1

kind: SpyreClusterPolicy

metadata:

name: spyreclusterpolicy

spec:

experimentalMode:

- perDeviceAllocation

loglevel: info

devicePlugin:

repository: "quay.io/ibm-aiu"

image: "spyre-device-plugin"

version: 1.1.0

configPath: /etc/aiu

configName: senlib_config.json

initContainer:

repository: "quay.io/ibm-aiu"

image: "spyre-device-plugin-init"

version: 1.1.0

executePolicy: IfNotPresent

scheduler:

repository: "quay.io/ibm-aiu"

image: "spyre-scheduler"

version: 1.1.0

podValidator:

enabled: true

repository: "quay.io/ibm-aiu"

image: "spyre-webhook-validator"

version: 1.1.0

cardManagement:

enabled: false

metricsExporter:

enabled: false

repository: "quay.io/ibm-aiu"

image: "spyre-exporter"

version: 1.1.0

Note: externalDeviceReservation flag under experimentalMode will enable the use of spyre-scheduler. If you want to use the default scheduler, you can remove this flag from the SpyreClusterPolicy yaml.

Click on Create and wait for spyreclusterpolicy to change to Ready state.

Verification

Navigate to Administration → CustomResourceDefinitions on side panel of OpenShift web console.
Search for SpyreNodeState in the search box and click on SpyreNodeState from the results.

Navigate to Instances tab and click on the worker node on which the cards are mounted.

Click on YAML tab and verify if cards are in healthy state under spec section of the YAML.

Install Red Hat OpenShift AI Operator

Refer Red Hat OpenShift AI operator installation for installing the operator. Proceed to Model Serving only when Red Hat OpenShift AI is successfully installed.

Model Serving

In this guide, We use granite-3.3-8b-instruct model for deployment using the vLLM Spyre s390x ServingRuntime for KServe in RawDeployment mode.

Prerequisites

This guide is based on the Red Hat OpenShift AI Operator version 3.0.0

Ensure that the operator is installed and running successfully before proceeding.

Resources

The following minimum resources are required for the model deployment of the granite-3.3-8b-instruct model.

Resource	Minimum Requirement
vCPUs	RAM (GiB)
6	160

Model Storage

To download the model, visit https://huggingface.co/ibm-granite/granite-3.3-8b-instruct and clone the repo.
Upload the cloned model to one of the supported storage backends in Red Hat OpenShift AI.
- S3 - compatible object storage
- URI - based repository
- OCI - compliant registry
This guide demonstrates accessing the model from an S3 - Compatible object storage backend.

Serving Runtime

Serving Runtime is a template that defines how a model server should run on KServe, including its container image, supported protocols, and runtime configuration.

You need to create a serving runtime with Red Hat AI inference server 3.2.5 as container image.

Procedure

Click on Settings option from left panel on dashboard and expand Model resources and operations.

Click on Serving runtimes and then on Add serving runtime.
Select the API protocol as REST from API protocol drop down.
Select Generative AI model from models type drop down.

Click on Start from scratch in the YAML window and provide the following YAML contents

apiVersion: serving.kserve.io/v1alpha1

kind: ServingRuntime

metadata:

annotations:

opendatahub.io/recommended-accelerators: '["ibm.com/spyre_vf"]'

opendatahub.io/runtime-version: v0.10.2.0

openshift.io/display-name: vLLM Spyre s390x ServingRuntime for KServe - RHAIIS - 3.2.5

opendatahub.io/apiProtocol: REST

labels:

opendatahub.io/dashboard: "true"

name: vllm-spyre-s390x-runtime-copy

spec:

annotations:

opendatahub.io/kserve-runtime: vllm

prometheus.io/path: /metrics

prometheus.io/port: "8080"

containers:

- args:

- --model=/mnt/models

- --port=8000

- --served-model-name={{.Name}}

command:

- /bin/bash

- -c

- source /etc/profile.d/ibm-aiu-setup.sh && exec python3 -m

vllm.entrypoints.openai.api_server "$@"

- --

env:

- name: HF_HOME

value: /tmp/hf_home

- name: FLEX_DEVICE

value: VF

- name: TOKENIZERS_PARALLELISM

value: "false"

- name: DTLOG_LEVEL

value: error

- name: TORCH_SENDNN_LOG

value: CRITICAL

- name: VLLM_SPYRE_USE_CB

value: "1"

- name: LD_PRELOAD value: ""

image: registry.redhat.io/rhaiis/vllm-spyre-rhel9:3.2.5-1765361213

name: kserve-container

ports:

- containerPort: 8000

protocol: TCP

volumeMounts:

- mountPath: /dev/shm

name: shm

multiModel: false

supportedModelFormats:

- autoSelect: true

name: vLLM

volumes:

- emptyDir:

medium: Memory

sizeLimit: 2Gi

name: shm

Click on Create and verify if a serving runtime named vLLM Spyre s390x ServingRuntime for KServe - RHAIIS - 3.2.5 is listed under available serving runtimes.

Create a route to gateway to access Red Hat OpenShift AI Dashboard

OpenShift AI 3.0 uses a Gateway API and a dynamically provisioned load balancer service to expose its services. If you are deploying OpenShift AI 3.0 in private and on-premises environments, you must manually configure a route to access OpenShift AI Dashboard.

Navigate to Networking → Routes in the OpenShift web console.
Click on Create Route and go to the YAML view.
Provide the below YAML contents, change the host as per your cluster details and click on create.

apiVersion: route.openshift.io/v1 kind: Route metadata: name: data-science-gateway-data-science-gateway-class namespace: openshift-ingress spec: host: data-science-gateway.apps.<CHANGEME> port: targetPort: https tls: termination: passthrough to: kind: Service name: data-science-gateway-data-science-gateway-class weight: 100 wildcardPolicy: None

Deployment steps

Accessing Red Hat OpenShift AI Dashboard

Login to OpenShift web console. Navigate to Operators → Installed Operators on the left panel.
Verify that the Red Hat OpenShift AI Operator is installed and in the Succeeded state.
Red Hat OpenShift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI operator.
Verify Red Hat OpenShift Service Mesh 3 is in succeeded state.
Click the Application Launcher icon in the top right corner of the console.

Click on the Red Hat OpenShift AI under OpenShift AI Self Managed Services to open the AI dashboard.
Verify if the Dashboard is loaded.

Hardware profiles

The default profile available under Hardware Profiles limits vCPU to 2 and Memory to 4 GiB.

Create a new hardware profile to support granite-3.3-8b-instruct's minimum vCPU and memory config.

Click on Settings option from left panel on dashboard and expand Environment Setup.

Click on Hardware profiles and then on Create Hardware Profile.
Provide a unique name and scroll down to edit the default and minimum allowed values.
Set the default and minimum CPU value to 6
Set the default and minimum Memory to 160 GiB.
Set the maximum values of both CPU and Memory based on the resource availability of your OpenShift Cluster.

Click on Add Resource to add Spyre Accelerator resource.

spyre-accelerator

Provide a unique name, Resource identifier as ibm.com/spyre_vf, Resource type as Accelerator, Default and maximum allowed to 4.
Click Update.
Scroll down to add a toleration to the deployment so that it can be scheduled onto the worker node that was previously tainted.
Click on Add toleration and provide the following values to add toleration.

Click on Add and then on Create Hardware profile.

Save the profile.

Deploy the model

On the Dashboard, click on Projects on the left panel.
Click Create Project, provide a name, and click Create. A project details page will appear with multiple tabs.

Go to Connections tab.
Click on Create connection and select S3 compatible object storage from the drop down.

Enter the following details and click create.
- Connection name
- Access Key
- Secret Key
- Endpoint URL
- Region
- Bucket Name

Go to Deploy tab.
- Select Existing Connection under Model Location.
- Choose the S3 connection created earlier.
- Provide the path to the model in your S3 bucket.
- Select Model type as Generative AI model from dropdown.

- Click Next and give a unique Model Deployment name.
- Under Hardware Profiles, select the profile you created earlier for granite 3.3 8b.
- Select vLLM Spyre s390x ServingRuntime for KServe - RHAIIS - 3.2.5 created earlier from dropdown.
- Set Model server replicas to 1.

Click Next.
Under Model Route, enable Make deployed models available through an external route to allow external access.
For test environments, token authentication is optional.
For production environments:
- Select Require Token Authentication.
- Enter the Service Account Name for token generation.
Check Add custom runtime arguments check box and enter the following custom runtime arguments in the text box given below.

--max-model-len=32768

--max-num-seqs=32

--tensor-parallel-size=4

Click on Deploy Model.

Wait until the deployment reaches the Starting state.
Once active, verify that the model deployment end points has been generated.

- Hover over Internal and external endpoint under Inference endpoints and copy the external endpoint to proceed for Inferencing. You can use Internal endpoint to send an inferenece request with in the cluster.

Inferencing

Inference Request

After deployment, use the generated and external endpoint to send inference requests.

curl -k https://<external-endpoint>/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "<model-name>", "prompt": "<prompt>", "max_tokens": <max-tokens>, "temperature": 0 }' | jq

Example Request:

curl -k https://granite-8b-granite-8b.apps.ocpz-standard-spyre.b39-ocpai.pok.stglabs.ibm.com/v1/completions \
-H "Content-Type: application/json" \
-d '{
"model": "granite-8b",
"prompt": "What is London famous for?",
"max_tokens": 300,
"temperature": 0
}' | jq

Example Response:

{

"id": "cmpl-85d173db3625491385450feae1534ffd",

"object": "text_completion",

"created": 1765525457,

"model": "granite-8b",

"choices": [

{

"index": 0,

"text": "\n\nLondon is famous for its iconic landmarks such as the Tower of London, Buckingham Palace, the Houses of Parliament, and Big Ben. It's also known for its rich history, cultural institutions like the British Museum and the National Gallery, and its vibrant arts and entertainment scene. The city is a global hub for finance, fashion, and tourism.",

"logprobs": null,

"finish_reason": "stop",

"stop_reason": null,

"token_ids": null,

"prompt_logprobs": null,

"prompt_token_ids": null

}

],

"service_tier": null,

"system_fingerprint": null,

"usage": {

"prompt_tokens": 7,

"total_tokens": 102,

"completion_tokens": 95,

"prompt_tokens_details": null

},

"kv_transfer_params": null

}

external-endpoint = external endpoint generated under models tab

model-name = model deployment name

max-tokens = maximum number of tokens you need to be generated in response

Note:

If the inference request gets timed out, increase the timeout imposed on the haproxy route as follows

Through OpenShift CLI:

oc annotate isvc <isvc-name> -n <namespace> haproxy.router.openshift.io/timeout=5m --overwrite

For this example

oc annotate isvc granite-8b -n granite-8b- haproxy.router.openshift.io/timeout=5m --overwrite

Get isvc-name with:

oc get isvc -n <namespace>

namespace - namespace in which model is deployed, which is granite-8b in this demo.

Through UI:

In the OpenShift console, navigate to Administration → CustomResourceDefinitions.
Search for InferenceService in the search bar.

Click on the InferenceService CRD and go to the Instances tab.

Select the entry corresponding to your model deployment
Go to YAML tab
Under the metadata → annotations section, add the following line:

haproxy.router.openshift.io: 5m

Click save to apply the change.

Conclusion

This guide walked through the complete end-to-end process from preparing the environment to validating model responses enabling you to confidently run large language models on OpenShift AI with IBM Spyre acceleration. As organizations increasingly adopt AI workloads on IBM Z and LinuxONE, this setup provides a reliable, efficient, and secure way to operationalize model serving at scale.

0 comments

24 views

Permalink

https://community.ibm.com/community/user/blogs/sindhuja-bd/2025/11/24/installing-and-configuring-the-spyre-accelerator

Red Hat OpenShift

Red Hat OpenShift

Installing and Configuring the Spyre AI accelerator card and deploy sample AI model on IBM Z and IBM® LinuxONE: A Step-by-step deployment guide

By Sindhuja bd posted 3 days ago

Introduction

Prerequisites

Verification

Spyre setup

Mounting of Spyre cards to the worker nodes on KVM:

Usage

Example

Mounting of Spyre cards to the worker nodes on ZVM:

Apply machine config

Create and apply the MachineConfig files

Apply SELinux Policy

Install Dependency Operators

Prerequisites

Node Feature Discovery (NFD) Operator

Red Hat Cert Manager Operator

Secondary Scheduler Operator

Install IBM Spyre Operator

Verification

Create Spyre Cluster Policy

Verification

Install Red Hat OpenShift AI Operator

Model Serving

Prerequisites

Resources

Model Storage

Serving Runtime

Create a route to gateway to access Red Hat OpenShift AI Dashboard

Deployment steps

Inferencing

Conclusion

Permalink

Additional
Resources

Office

Quick Links

Red Hat OpenShift

Red Hat OpenShift

Installing and Configuring the Spyre AI accelerator card and deploy sample AI model on IBM Z and IBM® LinuxONE: A Step-by-step deployment guide

By Sindhuja bd posted 3 days ago

Introduction

Prerequisites

Verification

Spyre setup

Mounting of Spyre cards to the worker nodes on KVM:

Usage

Example

Mounting of Spyre cards to the worker nodes on ZVM:

Apply machine config

Create and apply the MachineConfig files

Apply SELinux Policy

Install Dependency Operators

Prerequisites

Node Feature Discovery (NFD) Operator

Red Hat Cert Manager Operator

Secondary Scheduler Operator

Install IBM Spyre Operator

Verification

Create Spyre Cluster Policy

Verification

Install Red Hat OpenShift AI Operator

Model Serving

Prerequisites

Resources

Model Storage

Serving Runtime

Create a route to gateway to access Red Hat OpenShift AI Dashboard

Deployment steps

Inferencing

Conclusion

Permalink

Additional Resources

Office

Quick Links

Additional
Resources