IBM Z and LinuxONE - IBM Z

IBM Z

The enterprise platform for mission-critical applications brings next-level data privacy, security, and resiliency to your hybrid multicloud.

 View Only

Effortless AI Model Deployment with OpenShift AI Toolkit For IBM Z Operator on RedHat OpenShift

By Pathivada Mahalakshmi posted Wed February 19, 2025 11:10 PM

  

As AI adoption grows, efficiently deploying and managing machine learning models at scale becomes increasingly important. OpenShift Container Platform (OCP), with its robust Kubernetes-based orchestration, is a powerful platform for containerized applications. However, deploying and managing AI models requires additional support for scaling and optimizing inference workloads. This is where the Openshift AI Toolkit For IBM Z operator comes into play.

By combining OpenShift's powerful container orchestration with IBM Z Accelerated for NVIDIA Triton™ Inference Server, this solution automates the deployment, scaling, and management of AI models, ensuring smooth and high-performance operations in production (Redbook)

What is the OpenShift AI Toolkit For IBM Z Operator?

The OpenShift AI Toolkit Operator for IBM Z streamlines the deployment of IBM Z Accelerated for NVIDIA Triton™ Inference Server models on OpenShift through Kubernetes-native automation (github-OpenShift-AI-Toolkit-Operator)

It handles the setup, scaling, and monitoring of AI models, making running machine learning workloads in OpenShift easier.

 

Why Use Triton IS for Model Deployment?

Triton Inference Server on IBM Z and LinuxONE supports three backends as of today, namely, Python backend that allows you to deploy machine learning models written in Python for inference, ONNX-MLIR Backend that allows the deployment of onnx-mlir or zDLC compiled models (model.so) and a custom Snap ML C++ Backend, that allows efficient deployment of machine learning model pipelines on IBM Z and LinuxONE hardware.

 

Benefits of Using the OpenShift AI Toolkit For IBM Z with Triton IS

(github-OpenShift-AI-Toolkit-Operator)

  • Automated Deployment: The operator automatically deploys and configures Triton IS models on OpenShift, reducing manual work.
  • Scalable Inference: OpenShift’s Kubernetes-based system scales models based on demand, ensuring high performance.
  • Multi-Framework Support: Triton IS supports models from different machine learning frameworks, and the operator makes it easy to manage these diverse models.
  • Efficient Model Management: The operator simplifies tasks like model updates, monitoring, and scaling, ensuring seamless operation. 

 

Steps to Deploy Triton IS on OpenShift Using the OpenShift AI Toolkit For IBM Z Operator

1. Prerequisites

Ensure the following before starting the deployment process:

·       IBM Z and LinuxONE Container Registry: Required for accessing resources.

·       OpenShift Cluster on IBM Z: The deployment requires a configured OpenShift environment.

·       OpenShift CLI Installed: Ensure oc CLI is set up and authenticated with the cluster.

·       Model Repository Packaged: Prepare your model repository in the required format. Refer to the IBM Z Accelerated for NVIDIA Triton™ Inference Server Container Image repository for details on directory structure and formats.

2. Setting Environment Variables

Configure the environment for secure access to the IBM Z and LinuxONE Container Registry:

·       Define required variables such as the Registry URL, API key, and email.

·       Update the cluster’s pull-secret (openshift-config) with these credentials to authenticate with the icr.io to pull the IBM Z Accelerated for NVIDIA Triton™ Inference Server container.

3. Setting Up Namespace, PVC, Pods, and Syncing Models

Prepare the environment for Triton IS:

·       Create a Namespace to isolate resources.

·       Define a Persistent Volume Claim (PVC) to store model files.

·       Sync your model repository to the PVC using the rsync command.

4. Configuring TLS Certificates for gRPC Server

Enable gRPC for secure communication:

·       Enable HTTP/2: Configure the ingress controller to support HTTP/2 across the cluster.

·       Generate TLS Certificates: Use OpenSSL to create a TLS certificate and private key.

·       Store in OpenShift Secret: Store the generated certificate and key in a secret for secure reference.

5. Installing the OpenShift AI Toolkit For IBM Z Operator

Follow these steps to install the OpenShift AI Toolkit for IBM Z Operator using the OpenShift Web Console:

·      Verify whether the operator is already installed. If not installed or a specific version is required, apply the operator's CatalogSource to the OpenShift cluster using the oc CLI. This will make the operator available in the OperatorHub.

·       Open the OperatorHub, search for the OpenShift AI Toolkit for IBM Z Operator, and click Install.

·       Choose the desired version of the operator and attach a screenshot that displays the selected version and the installation page.

6. Creating Custom Resource (TritonInterfaceServer)

After the operator is installed, deploy IBM Z Accelerated for NVIDIA Triton™ Inference Server using the operator’s custom resource:

·       Create a TritonInterfaceServer resource, specifying the required parameters for HTTP, gRPC, and metrics.

·       Reference the synced model repository (PVC) and gRPC secrets in the configuration.

7. Validating Model Deployment

Once the TritonInterfaceServer is created, validate the deployment to ensure functionality:

·       Check the Triton Inference Server logs for any errors or deployment status updates.

 

·       Confirm that models have loaded successfully in the repository.

 

·       Test inference functionality using HTTP and gRPC endpoints with sample input files.

 

·       Validate the metrics endpoint using curl or monitoring tools to confirm observability.

 

Note: All code and YAML files referenced in these steps can be found in the link provided in the reference section of this blog.

 

Conclusion

  • With the Openshift AI Toolkit For IBM Z, there’s no need to manually create deployments, services, or routes for IBM Z Accelerated for NVIDIA Triton™ Inference Server (Triton IS) protocols such as HTTP, gRPC, and metrics. In contrast, without the operator, these configurations must be set up manually to link and access Triton IS. By leveraging the operator alongside Triton IS, the process of deploying and managing machine learning models becomes significantly easier and more scalable. The operator handles tasks like model deployment, scaling, and monitoring automatically, while Triton IS ensures high-performance inference. By following the outlined steps, including configuring a Persistent Volume Claim for the model repository, you can quickly and efficiently deploy Triton IS on OpenShift, enabling real-time, high-performance AI model serving with ease. (github-OpenShift-AI-Toolkit-Operator)

Reference Links

·       https://github.com/IBM/ibmz-accelerated-for-nvidia-triton-inference-server

·    https://github.com/IBM/OpenShift-AI-Toolkit-Operato

Disclaimer: The information provided in this blog is for informational and educational purposes only and is subject to change based on product updates, documentation, and real-world variations.

0 comments
25 views

Permalink