Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

View Only

Back to Blog List

Getting Started with Red Hat OpenShift AI on IBM Z and IBM® LinuxONE: Install, Configure & Deploy Your First Model

By Sindhuja bd posted yesterday

Authors : Sindhuja BD (sindhujabd@ibm.com), Dilip B (Dilip.Bhagavan@ibm.com), Modassar Rana (modassar.rana@ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)

Red Hat OpenShift AI (RHOAI) is a platform for data scientists and developers of artificial intelligence and machine learning (AI/ML) applications.

OpenShift AI provides a platform to develop, train, serve, test, and monitor AI/ML models and applications across on-premises and cloud environments.

Operator Installation:

The Red Hat OpenShift AI Operator supports installation using either the command-line interface (CLI) or the OpenShift web console.

This guide demonstrates installation through the OpenShift web console.

Red Hat OpenShift AI Installation using OpenShift Web Console

Prerequisites

A running OpenShift cluster, version 4.19 or later, configured with a default storage class that can be dynamically provisioned.
Cluster administrator privileges for your OpenShift cluster.
If you are using custom namespaces, ensure that you have created and labeled them as required.

Installation Steps

Log in to the OpenShift web console as a cluster administrator.
In the left panel, navigate to Operators → OperatorHub.
On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing RHOAI into the Filter by keyword box.

Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.
Select fast-3.x from channel dropdown
Select 3.0.0 from version dropdown

7. Click Install. The Install Operator page opens.

8. For Installation mode, note that the only available value is All namespaces on the cluster (default). This installation mode makes the Operator available to all namespaces in the cluster.

9. For Installed Namespace, choose one of the following options:

To use the predefined operator namespace, select the Operator recommended Namespace: redhat-ods-operator option.
To use the custom operator namespace that you created, select the Select a Namespace option, and then select the namespace from the drop-down list.

For Update approval, select one of the following update strategies:

Automatic: New updates in the update channel are installed as soon as they become available.
Manual: A cluster administrator must approve any new updates before installation begins.

Click Install.
The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.

Verification

In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:
- Installing — installation is in progress; wait for this to change to Succeeded. This might take several minutes.
- Succeeded — installation is successful.

Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.
Verify Red Hat Openshift ServiceMesh 3 is in succeeded state.
Click on Red Hat OpenShift AI Operator.
Go to the Data Science Cluster tab and click on Create DataScienceCluster.
Paste the following YAML contents.
In the spec.components section of the CR, for each OpenShift AI component shown, set the value of the managementState field to either Managed or Removed as per the components you want to enable/disable on OpenShift AI Dashboard.
These values are defined as follows:

Managed:

- The Operator actively manages the component, installs it, and tries to keep it active.
- The Operator will upgrade the component only if it is safe to do so.

Removed:

- The Operator actively manages the component but does not install it.
- If the component is already installed, the Operator will try to remove it.

apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Managed configuration: disableGenAIUI: true datasciencepipelines: managementState: Removed kserve: managementState: Managed defaultDeploymentMode: RawDeployment serving: managementState: Removed name: knative-serving kueue: managementState: Removed modelmeshserving: managementState: Removed modelregistry: managementState: Managed ray: managementState: Removed trainingoperator: managementState: Removed trustyai: managementState: Managed workbenches: managementState: Managed

Go to the Data Science Cluster tab and ensure it's in the Ready state

Go to the DSCInitialization tab and ensure it's in the Ready state

Model Serving

While OpenShift AI provides an environment to develop, train, serve, test, and monitor AI/ML models and applications on-premises or in the cloud, this guide focuses on the Model Serving Component of Red Hat OpenShift AI. It demonstrates how easily an AI model can be served using RHOAI on IBM Z.

For this demo, We use TinyLlama-1.1B-Chat-v1.0 model for deployment using the vLLM CPU(ppc64le/s390x) ServingRuntime for KServe in RawDeployment mode.

Prerequisites

Operator

This guide is based on the Red Hat OpenShift AI Operator version 3.0.0

Ensure that the operator is installed and running successfully before proceeding.

Resources

The following minimum resources are required for the model deployment of the TinyLlama-1.1B-Chat-v1.0 model:

Resource	Minimum Requirement
vCPUs	RAM (GiB)
4	12

Model Storage

To download the model, visit
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 and clone the repo.
Upload the cloned model to one of the supported storage backends in Red Hat OpenShift AI

S3-compatible object storage
URI-based repository
OCI-compliant registry

This guide demonstrates accessing the model from an S3-compatible object storage backend.

Serving Runtime

The vLLM CPU(ppc64le/s390x) ServingRuntime for KServe comes preinstalled with the Red Hat OpenShift AI Operator. We use this runtime to serve the TinyLlama-1.1B-Chat-v1.0 model.

Create a Route to Gateway to access Red Hat OpenShift AI Dashboard:

OpenShift AI 3.0 uses a Gateway API and a dynamically provisioned LoadBalancer Service to expose its services. If you are deploying OpenShift AI 3.0 in private and on-premises environments, you must manually configure a route to access OpenShift AI Dashboard.

For more information, refer https://access.redhat.com/articles/7133770

Navigate to Networking → Routes on OpenShift Web Console.
Click on Create Route and go to the YAML view.
Provide the below YAML contents, change the host as per your cluster details and click create.

apiVersion: route.openshift.io/v1 kind: Route metadata: name: data-science-gateway-data-science-gateway-class namespace: openshift-ingress spec: host: data-science-gateway.apps.<CHANGEME> port: targetPort: https tls: termination: passthrough to: kind: Service name: data-science-gateway-data-science-gateway-class weight: 100 wildcardPolicy: None

Deployment Steps

Accessing Red Hat OpenShift AI Dashboard

Login to OpenShift web console. Navigate to Operators → Installed Operators on the left panel.
Verify that the Red Hat OpenShift AI Operator is installed and in the Succeeded state.

Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.
Verify Red Hat Openshift Service Mesh 3 is in succeeded state.
Click the Application Launcher icon in the top right corner of the console.
Click on the Red Hat OpenShift AI under OpenShift Self Managed Services to open the AI Dashboard.

7. Verify the Dashboard is loaded.

Hardware Profiles

The default profile available under Hardware profiles limits vCPU to 2 and Memory to 4 GiB.

Create a new hardware profile to support TinyLlama’s minimum vCPU and memory config.

Click on Settings option from left pane on dashboard and expand Environment Setup.

Click on Hardware Profiles.
Click on Create Hardware Profile.

Provide a unique name and scroll down to edit the default values. Set the CPU default value to 6 and Memory to 12 GiB.
Save the profile.

Deploy the model

On the Dashboard, click on Projects on the left panel.
Click Create project, provide a name, and click Create. A project details page will appear with multiple tabs.

Go to Connections tab

Click on Create connection and select S3 compatible object storage from the drop down.
Enter the following details and click create.

- Connection name
- Access Key
- Secret Key
- Endpoint URL
- Region
- Bucket name

6. Go to Deploy tab

Select Existing Connection under Model Location
Choose the S3 connection created earlier.
Provide the path to the model in your S3 bucket
Select Model type as Generative AI model from dropdown as TinyLlama is a generative model.

Click Next and give a unique Model Deployment Name.
Under Hardware profiles, select the profile you created earlier for TinyLlama.
Select vLLM CPU(ppc64le/s390x) ServingRuntime for KServe from drop down.
Set Model server replicas to 1.

Click Next.

Under Model Route, enable Make deployed models available through an external route to allow external access
For test environments, token authentication is optional.
For production environments:

Select Require token authentication.
Enter the Service Account Name for token generation.
(Optional) Click Add a service account to include multiple accounts.

Check Add custom runtime arguments check box and enter the following custom runtime argument in the text box given below.

–-dtype=float

Click on Deploy Model.

Wait until the deployment reaches the Starting state.
Once active, verify that the model endpoint has been generated.

Inferencing

Inference Request

After deployment, use the generated external endpoint to send inference requests.

Request Format

curl -k https://<external-endpoint>/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "<model-name>", "prompt": "<prompt>", "max_tokens": <max-tokens>, "temperature": 0 }' | jq

external-endpoint = external endpoint generated under models tab

model-name = model deployment name

max-tokens = maximum number of tokens you need to be generated in response

Note:

If the inference request gets timed out, increase the timeout imposed on the haproxy route as follows

Through OpenShift CLI:

oc annotate isvc <isvc-name> -n <namespace> haproxy.router.openshift.io/timeout=5m --overwrite

Get isvc-name with:

oc get isvc -n <namespace>

namespace - namespace in which model is deployed

Through UI:

In the OpenShift console, navigate to Administration → CustomResourceDefinitions.
Search for InferenceService in the search bar.
Click on the InferenceService CRD and go to the Instances tab.
Select the entry corresponding to your model deployment
Go to YAML tab
Under the metadata → annotations section, add the following line:

haproxy.router.openshift.io: 10m

Click save to apply the change.

0 comments

37 views

Permalink

https://community.ibm.com/community/user/blogs/sindhuja-bd/2025/11/19/getting-started-with-rhoai-on-ibm-z-and-linuxone

Red Hat OpenShift

Red Hat OpenShift

Getting Started with Red Hat OpenShift AI on IBM Z and IBM® LinuxONE: Install, Configure & Deploy Your First Model

By Sindhuja bd posted yesterday

Operator Installation:

Red Hat OpenShift AI Installation using OpenShift Web Console

Prerequisites

Installation Steps

Verification

Model Serving

Prerequisites

Operator

Resources

Model Storage

Serving Runtime

Create a Route to Gateway to access Red Hat OpenShift AI Dashboard:

Deployment Steps

Accessing Red Hat OpenShift AI Dashboard

Hardware Profiles

Deploy the model

Inferencing

Inference Request

Permalink

Additional
Resources

Office

Quick Links

Red Hat OpenShift

Red Hat OpenShift

Getting Started with Red Hat OpenShift AI on IBM Z and IBM® LinuxONE: Install, Configure & Deploy Your First Model

By Sindhuja bd posted yesterday

Operator Installation:

Red Hat OpenShift AI Installation using OpenShift Web Console

Prerequisites

Installation Steps

Verification

Model Serving

Prerequisites

Operator

Resources

Model Storage

Serving Runtime

Create a Route to Gateway to access Red Hat OpenShift AI Dashboard:

Deployment Steps

Accessing Red Hat OpenShift AI Dashboard

Hardware Profiles

Deploy the model

Inferencing

Inference Request

Permalink

Additional Resources

Office

Quick Links

Additional
Resources