Authors : Sindhuja BD (sindhujabd@ibm.com), Dilip B (Dilip.Bhagavan@ibm.com), Modassar Rana (modassar.rana@ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)
Red Hat OpenShift AI (RHOAI) is a platform for data scientists and developers of artificial intelligence and machine learning (AI/ML) applications.
OpenShift AI provides a platform to develop, train, serve, test, and monitor AI/ML models and applications across on-premises and cloud environments.
Operator Installation:
The Red Hat OpenShift AI Operator supports installation using either the command-line interface (CLI) or the OpenShift web console.
This guide demonstrates installation through the OpenShift web console.
Red Hat OpenShift AI Installation using OpenShift Web Console
Prerequisites
-
A running OpenShift cluster, version 4.19 or later, configured with a default storage class that can be dynamically provisioned.
-
Cluster administrator privileges for your OpenShift cluster.
-
If you are using custom namespaces, ensure that you have created and labeled them as required.
Installation Steps
-
Log in to the OpenShift web console as a cluster administrator.
-
In the left panel, navigate to Operators → OperatorHub.

-
On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing RHOAI into the Filter by keyword box.

-
Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.
-
Select fast-3.x from channel dropdown
-
Select 3.0.0 from version dropdown

7. Click Install. The Install Operator page opens.
8. For Installation mode, note that the only available value is All namespaces on the cluster (default). This installation mode makes the Operator available to all namespaces in the cluster.
9. For Installed Namespace, choose one of the following options:
- To use the predefined operator namespace, select the Operator recommended Namespace: redhat-ods-operator option.
- To use the custom operator namespace that you created, select the Select a Namespace option, and then select the namespace from the drop-down list.
-
For Update approval, select one of the following update strategies:
- Automatic: New updates in the update channel are installed as soon as they become available.
- Manual: A cluster administrator must approve any new updates before installation begins.

-
Click Install.
-
The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.

Verification

-
Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.
-
Verify Red Hat Openshift ServiceMesh 3 is in succeeded state.
-
Click on Red Hat OpenShift AI Operator.
-
Go to the Data Science Cluster tab and click on Create DataScienceCluster.
-
Paste the following YAML contents.
-
In the spec.components section of the CR, for each OpenShift AI component shown, set the value of the managementState field to either Managed or Removed as per the components you want to enable/disable on OpenShift AI Dashboard.
-
These values are defined as follows:
Managed:
-
- The Operator actively manages the component, installs it, and tries to keep it active.
- The Operator will upgrade the component only if it is safe to do so.
Removed:
-
- The Operator actively manages the component but does not install it.
- If the component is already installed, the Operator will try to remove it.
|
apiVersion: datasciencecluster.opendatahub.io/v1 kind: DataScienceCluster metadata: name: default-dsc spec: components: codeflare: managementState: Removed dashboard: managementState: Managed configuration: disableGenAIUI: true datasciencepipelines: managementState: Removed kserve: managementState: Managed defaultDeploymentMode: RawDeployment serving: managementState: Removed name: knative-serving kueue: managementState: Removed modelmeshserving: managementState: Removed modelregistry: managementState: Managed ray: managementState: Removed trainingoperator: managementState: Removed trustyai: managementState: Managed workbenches: managementState: Managed
|



Model Serving
While OpenShift AI provides an environment to develop, train, serve, test, and monitor AI/ML models and applications on-premises or in the cloud, this guide focuses on the Model Serving Component of Red Hat OpenShift AI. It demonstrates how easily an AI model can be served using RHOAI on IBM Z.
For this demo, We use TinyLlama-1.1B-Chat-v1.0 model for deployment using the vLLM CPU(ppc64le/s390x) ServingRuntime for KServe in RawDeployment mode.
Prerequisites
Operator
This guide is based on the Red Hat OpenShift AI Operator version 3.0.0
Ensure that the operator is installed and running successfully before proceeding.
Resources
The following minimum resources are required for the model deployment of the TinyLlama-1.1B-Chat-v1.0 model:
|
Resource
|
Minimum Requirement
|
|
vCPUs
|
RAM (GiB)
|
|
4
|
12
|
Model Storage
-
To download the model, visit
https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 and clone the repo.
-
Upload the cloned model to one of the supported storage backends in Red Hat OpenShift AI
- S3-compatible object storage
- URI-based repository
- OCI-compliant registry
This guide demonstrates accessing the model from an S3-compatible object storage backend.
Serving Runtime
The vLLM CPU(ppc64le/s390x) ServingRuntime for KServe comes preinstalled with the Red Hat OpenShift AI Operator. We use this runtime to serve the TinyLlama-1.1B-Chat-v1.0 model.
Create a Route to Gateway to access Red Hat OpenShift AI Dashboard:
OpenShift AI 3.0 uses a Gateway API and a dynamically provisioned LoadBalancer Service to expose its services. If you are deploying OpenShift AI 3.0 in private and on-premises environments, you must manually configure a route to access OpenShift AI Dashboard.
For more information, refer https://access.redhat.com/articles/7133770
-
Navigate to Networking → Routes on OpenShift Web Console.
-
Click on Create Route and go to the YAML view.
-
Provide the below YAML contents, change the host as per your cluster details and click create.
|
apiVersion: route.openshift.io/v1 kind: Route metadata: name: data-science-gateway-data-science-gateway-class namespace: openshift-ingress spec: host: data-science-gateway.apps.<CHANGEME> port: targetPort: https tls: termination: passthrough to: kind: Service name: data-science-gateway-data-science-gateway-class weight: 100 wildcardPolicy: None
|

Deployment Steps
Accessing Red Hat OpenShift AI Dashboard
-
Login to OpenShift web console. Navigate to Operators → Installed Operators on the left panel.
-
Verify that the Red Hat OpenShift AI Operator is installed and in the Succeeded state.

-
Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.
-
Verify Red Hat Openshift Service Mesh 3 is in succeeded state.
-
Click the Application Launcher icon in the top right corner of the console.
-
Click on the Red Hat OpenShift AI under OpenShift Self Managed Services to open the AI Dashboard.

7. Verify the Dashboard is loaded.

Hardware Profiles
The default profile available under Hardware profiles limits vCPU to 2 and Memory to 4 GiB.
Create a new hardware profile to support TinyLlama’s minimum vCPU and memory config.
-
Click on Settings option from left pane on dashboard and expand Environment Setup.

-
Click on Hardware Profiles.
-
Click on Create Hardware Profile.

-
Provide a unique name and scroll down to edit the default values. Set the CPU default value to 6 and Memory to 12 GiB.
-
Save the profile.
Deploy the model
-
On the Dashboard, click on Projects on the left panel.
-
Click Create project, provide a name, and click Create. A project details page will appear with multiple tabs.

-
Go to Connections tab

-
Click on Create connection and select S3 compatible object storage from the drop down.
-
Enter the following details and click create.
-
-
Connection name
-
Access Key
-
Secret Key
-
Endpoint URL
-
Region
-
Bucket name

6. Go to Deploy tab
-
Select Existing Connection under Model Location
-
Choose the S3 connection created earlier.
-
Provide the path to the model in your S3 bucket
-
Select Model type as Generative AI model from dropdown as TinyLlama is a generative model.

-
Click Next and give a unique Model Deployment Name.
-
Under Hardware profiles, select the profile you created earlier for TinyLlama.
-
Select vLLM CPU(ppc64le/s390x) ServingRuntime for KServe from drop down.
-
Set Model server replicas to 1.

-
Under Model Route, enable Make deployed models available through an external route to allow external access
-
For test environments, token authentication is optional.
-
For production environments:
-
Select Require token authentication.
-
Enter the Service Account Name for token generation.
-
(Optional) Click Add a service account to include multiple accounts.
-
Check Add custom runtime arguments check box and enter the following custom runtime argument in the text box given below.
–-dtype=float

-
Wait until the deployment reaches the Starting state.
-
Once active, verify that the model endpoint has been generated.

Inferencing
Inference Request
-
After deployment, use the generated external endpoint to send inference requests.
Request Format
|
curl -k https://<external-endpoint>/v1/completions \ -H "Content-Type: application/json" \ -d '{ "model": "<model-name>", "prompt": "<prompt>", "max_tokens": <max-tokens>, "temperature": 0 }' | jq
|
external-endpoint = external endpoint generated under models tab
model-name = model deployment name
max-tokens = maximum number of tokens you need to be generated in response
Note:
If the inference request gets timed out, increase the timeout imposed on the haproxy route as follows
Through OpenShift CLI:
|
oc annotate isvc <isvc-name> -n <namespace> haproxy.router.openshift.io/timeout=5m --overwrite
|
Get isvc-name with:
oc get isvc -n <namespace>
namespace - namespace in which model is deployed
Through UI:
-
In the OpenShift console, navigate to Administration → CustomResourceDefinitions.
-
Search for InferenceService in the search bar.
-
Click on the InferenceService CRD and go to the Instances tab.
-
Select the entry corresponding to your model deployment
-
Go to YAML tab
-
Under the metadata → annotations section, add the following line:
|
haproxy.router.openshift.io: 10m
|
-
Click save to apply the change.