Red Hat OpenShift

Red Hat OpenShift

Red Hat OpenShift

Kubernetes-based container platform that provides a trusted environment to run enterprise workloads. It extends the Kubernetes platform with built-in software to enhance app lifecycle development, operations, and security

 View Only

Getting Started with Red Hat OpenShift AI on IBM Z and IBM® LinuxONE: Install, Configure & Deploy Your First Model

By Sindhuja bd posted yesterday

  

Authors : Sindhuja BD (sindhujabd@ibm.com), Dilip B (Dilip.Bhagavan@ibm.com), Modassar Rana (modassar.rana@ibm.com), Rishika Kedia (rishika.kedia@in.ibm.com)

Red Hat OpenShift AI (RHOAI) is a platform for data scientists and developers of artificial intelligence and machine learning (AI/ML) applications.

OpenShift AI provides a platform to develop, train, serve, test, and monitor AI/ML models and applications across on-premises and cloud environments.

Operator Installation:

The Red Hat OpenShift AI Operator supports installation using either the command-line interface (CLI) or the OpenShift web console.

This guide demonstrates installation through the OpenShift web console. 

Red Hat OpenShift AI Installation using OpenShift Web Console

Prerequisites

  • A running OpenShift cluster, version 4.19 or later, configured with a default storage class that can be dynamically provisioned.

  • Cluster administrator privileges for your OpenShift cluster.

  • If you are using custom namespaces, ensure that you have created and labeled them as required.

Installation Steps

  1. Log in to the OpenShift web console as a cluster administrator.

  2. In the left panel, navigate to Operators OperatorHub.


  3. On the OperatorHub page, locate the Red Hat OpenShift AI Operator by scrolling through the available Operators or by typing RHOAI into the Filter by keyword box.

  1. Click the Red Hat OpenShift AI tile. The Red Hat OpenShift AI information pane opens.

  2. Select fast-3.x from channel dropdown

  3. Select 3.0.0 from version dropdown

      7. Click Install. The Install Operator page opens.

      8. For Installation mode, note that the only available value is All namespaces on the cluster (default). This installation mode makes the Operator available to all namespaces in the cluster.

      9. For Installed Namespace, choose one of the following options:

  • To use the predefined operator namespace, select the Operator recommended Namespace: redhat-ods-operator option.
  • To use the custom operator namespace that you created, select the Select a Namespace option, and then select the namespace from the drop-down list.
  1. For Update approval, select one of the following update strategies:

  • Automatic: New updates in the update channel are installed as soon as they become available.
  • Manual: A cluster administrator must approve any new updates before installation begins.




  1. Click Install.

  2. The Installing Operators pane appears. When the installation finishes, a checkmark appears next to the Operator name.


Verification

  • In the OpenShift web console, from the side panel, navigate to Operators → Installed Operators and confirm that the Red Hat OpenShift AI Operator shows one of the following statuses:

    • Installing — installation is in progress; wait for this to change to Succeeded. This might take several minutes.

    • Succeeded — installation is successful.

  • Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.

  • Verify Red Hat Openshift ServiceMesh 3 is in succeeded state.

  • Click on Red Hat OpenShift AI Operator.

  • Go to the Data Science Cluster tab and click on Create DataScienceCluster.

  • Paste the following YAML contents. 

  • In the spec.components section of the CR, for each OpenShift AI component shown, set the value of the managementState field to either Managed or Removed as per the components you want to enable/disable on OpenShift AI Dashboard.

  • These values are defined as follows:

Managed:

    •  The Operator actively manages the component, installs it, and tries to keep it active. 
    •  The Operator will upgrade the component only if it is safe to do so.

Removed:

    •  The Operator actively manages the component but does not install it.
    •  If the component is already installed, the Operator will try to remove it.

apiVersion: datasciencecluster.opendatahub.io/v1
kind: DataScienceCluster
metadata:
  name: default-dsc
spec:
  components:
    codeflare:
      managementState: Removed
    dashboard:
      managementState: Managed
      configuration:
        disableGenAIUI: true
    datasciencepipelines:
      managementState: Removed
    kserve:
      managementState: Managed
      defaultDeploymentMode: RawDeployment
      serving:
        managementState: Removed
        name: knative-serving
    kueue:
      managementState: Removed
    modelmeshserving:
      managementState: Removed
    modelregistry:
      managementState: Managed
    ray:
      managementState: Removed
    trainingoperator:
      managementState: Removed
    trustyai:
      managementState: Managed
    workbenches:
      managementState: Managed


  • Go to the Data Science Cluster tab and ensure it's in the Ready state

 

  • Go to the DSCInitialization tab and ensure it's in the Ready state

Model Serving

While OpenShift AI provides an environment to develop, train, serve, test, and monitor AI/ML models and applications on-premises or in the cloud, this guide focuses on the Model Serving Component of Red Hat OpenShift AI. It demonstrates how easily an AI model can be served using RHOAI on IBM Z.

For this demo, We use TinyLlama-1.1B-Chat-v1.0 model for deployment using the vLLM CPU(ppc64le/s390x) ServingRuntime for KServe in RawDeployment mode.

Prerequisites

Operator

This guide is based on the Red Hat OpenShift AI Operator version 3.0.0

Ensure that the operator is installed and running successfully before proceeding.

Resources

The following minimum resources are required for the model deployment of the TinyLlama-1.1B-Chat-v1.0 model:

Resource

Minimum Requirement

vCPUs

RAM (GiB)

4

12

 

 Model Storage

  1. To download the model, visit
    https://huggingface.co/TinyLlama/TinyLlama-1.1B-Chat-v1.0 and clone the repo.

  2. Upload the cloned model to one of the supported storage backends in Red Hat OpenShift AI

  •  S3-compatible object storage
  •  URI-based repository
  •  OCI-compliant registry

This guide demonstrates accessing the model from an S3-compatible object storage backend.

Serving Runtime

The vLLM CPU(ppc64le/s390x) ServingRuntime for KServe comes preinstalled with the Red Hat OpenShift AI Operator. We use this runtime to serve the TinyLlama-1.1B-Chat-v1.0 model.

Create a Route to Gateway to access Red Hat OpenShift AI Dashboard:

OpenShift AI 3.0 uses a Gateway API and a dynamically provisioned LoadBalancer Service to expose its services. If you are deploying OpenShift AI 3.0 in private and on-premises environments, you must manually configure a route to access OpenShift AI Dashboard.

For more information, refer https://access.redhat.com/articles/7133770

  1. Navigate to Networking → Routes on OpenShift Web Console.

  2. Click on Create Route and go to the YAML view.

  3. Provide the below YAML contents, change the host as per your cluster details and click create.

 

apiVersion: route.openshift.io/v1
kind: Route
metadata:
  name: data-science-gateway-data-science-gateway-class
  namespace: openshift-ingress
spec:
  host: data-science-gateway.apps.<CHANGEME>
  port:
    targetPort: https
  tls:
    termination: passthrough
  to:
    kind: Service
    name: data-science-gateway-data-science-gateway-class
    weight: 100
  wildcardPolicy: None




Deployment Steps

Accessing Red Hat OpenShift AI Dashboard

 

  1. Login to OpenShift web console. Navigate to Operators → Installed Operators on the left panel.

  2. Verify that the Red Hat OpenShift AI Operator is installed and in the Succeeded state. 



  1. Red Hat Openshift Service Mesh 3 is a dependency operator and gets installed along with Red Hat OpenShift AI Operator.

  2. Verify Red Hat Openshift Service Mesh 3 is in succeeded state.

  3. Click the Application Launcher icon in the top right corner of the console.

  4. Click on the Red Hat OpenShift AI under OpenShift Self Managed Services to open the AI Dashboard.

       7. Verify the Dashboard is loaded. 

Hardware Profiles

The default profile available under Hardware profiles limits vCPU to 2 and Memory to 4 GiB. 

Create a new hardware profile to support TinyLlama’s minimum vCPU and memory config.

  1. Click on Settings option from left pane on dashboard and expand Environment Setup. 


  1. Click on Hardware Profiles

  2. Click on Create Hardware Profile.




  1. Provide a unique name and scroll down to edit the default values. Set the CPU default value to 6 and Memory to 12 GiB.

  2. Save the profile.

Deploy the model

  1. On the Dashboard, click on Projects on the left panel.

  2. Click Create project, provide a name, and click Create. A project details page will appear with multiple tabs.

 

  1. Go to Connections tab


  1. Click on Create connection and select  S3 compatible object storage from the drop down.

  2. Enter the following details and click create.

    • Connection name

    • Access Key

    • Secret Key

    • Endpoint URL

    • Region

    • Bucket name

    6. Go to Deploy tab

  • Select Existing Connection under Model Location

  • Choose the S3 connection created earlier.

  • Provide the path to the model in your S3 bucket

  • Select Model type as Generative AI model from dropdown as TinyLlama is a generative model. 


  • Click Next and give a unique Model Deployment Name.

  • Under Hardware profiles, select the profile you created earlier for TinyLlama.

  • Select vLLM CPU(ppc64le/s390x) ServingRuntime for KServe from drop down.

  • Set Model server replicas to 1.


    

  • Click Next.

  • Under Model Route, enable Make deployed models available through an external route to allow external access

  • For test environments, token authentication is optional.

  • For production environments:

    • Select Require token authentication.

    • Enter the Service Account Name for token generation.

    • (Optional) Click Add a service account to include multiple accounts.

  • Check Add custom runtime arguments check box and enter the following custom runtime argument in the text box given below.

–-dtype=float

  • Click on Deploy Model.


  1. Wait until the deployment reaches the Starting state.

  2. Once active, verify that the model endpoint has been generated.

Inferencing

Inference Request

  1. After deployment, use the generated external endpoint to send inference requests.

Request Format

curl -k https://<external-endpoint>/v1/completions \
-H "Content-Type: application/json" \
-d '{
  "model": "<model-name>",
  "prompt": "<prompt>",
  "max_tokens": <max-tokens>,
  "temperature": 0
}' | jq

external-endpoint = external endpoint generated under models tab

model-name = model deployment name

max-tokens = maximum number of tokens you need to be generated in response

Note:

If the inference request gets timed out, increase the timeout imposed on the haproxy route as follows

Through OpenShift CLI:

oc annotate isvc <isvc-name> -n <namespace> haproxy.router.openshift.io/timeout=5m --overwrite

 

Get isvc-name with: 

oc get isvc -n <namespace>

namespace - namespace in which model is deployed 

Through UI:

  1. In the OpenShift console, navigate to Administration → CustomResourceDefinitions.

  2. Search for InferenceService in the search bar.

  3. Click on the InferenceService CRD and go to the Instances tab.

  4. Select the entry corresponding to your model deployment

  5. Go to YAML tab 

  6. Under the metadata → annotations section, add the following line:

haproxy.router.openshift.io: 10m

  1. Click save to apply the change.

0 comments
37 views

Permalink