Data and AI on Power

 View Only



LinkedIn Share on LinkedIn

Optimizing LLM Deployment on IBM Power10 with Ollama and Open WebUI

By Marvin Gießing posted Wed March 12, 2025 07:21 AM

  

Optimizing LLM Deployment on IBM Power10 with Ollama and Open WebUI

Introduction

Large language models (LLMs) have transformed natural language processing (NLP) applications, enabling capabilities such as real-time conversational AI, document analysis, and content generation. However, deploying LLMs efficiently—especially on CPU-based systems—presents unique computational challenges. While GPUs remain the dominant choice for AI model training, advancements in software optimization and CPU architectures have significantly improved LLM inference on CPUs.

This blog showcases Ollama, a lightweight backend optimized for serving LLMs on CPUs, and Open WebUI, a user-friendly frontend for seamless model interaction. Specifically, we explore how IBM Power10 servers, with their built-in AI acceleration and high-performance computing capabilities, provide an ideal platform for CPU-based LLM inference.

Why IBM Power10 for LLM Inference?

While GPUs provide parallelized execution for deep learning, deploying LLMs on IBM Power10 offers distinct advantages, particularly for enterprises seeking cost-effective and scalable AI solutions. Key benefits include:

  • Power10 AI Acceleration – IBM Power10 CPUs feature Matrix Math Acceleration (MMA) units, significantly enhancing AI and deep learning workloads without requiring discrete GPUs.
  • Cost-Effectiveness – Leveraging existing Power10 infrastructure reduces the need for additional hardware investments.
  • Scalability – Power10 servers support high memory bandwidth and multi-threading, making them well-suited for large-scale AI inference.
  • Optimized AI Workloads – Power10’s enhanced SIMD capabilities improve vector operations, benefiting model quantization and inference performance.

Deploying Ollama with Open WebUI on IBM Power10

Prerequisites

To set up Ollama and Open WebUI on IBM Power10, ensure you have:

  • An IBM Power10 server with Linux (RHEL, Ubuntu or any of its distributions)

  • Podman installed for containerized deployment

  • Internet access for pulling necessary images

Deployment Steps

1.) Create pod network for communication

podman network create ollama-webui

2.) Start Ollama:

podman run -d --rm --network ollama-webui -v ollama:/root/.ollama -p 11434:11434 --name ollama quay.io/mgiessing/ollama:v0.5.12

This initializes the Ollama backend, exposing it on port 11434.

3.) Launch Open WebUI:

podman run -d --rm --network ollama-webui -p 3000:8080 -e OLLAMA_BASE_URL=http://ollama:11434 -e WEBUI_SECRET_KEY="" -v open-webui:/app/backend/data quay.io/mgiessing/open-webui:v0.5.12

This starts the Open WebUI frontend and connects it to the Ollama backend.

4.) Access the Interface:

Open a browser and navigate to http://<SERVER_IP>:3000

Usage

After logging into Open WebUI, you can create an admin account to manage the instance. To get started with downloading and using models, consult the Open WebUI Quickstart Guide, which provides step-by-step instructions for managing models and running inferences efficiently.

Advanced Topics

Beyond basic model serving, Ollama and Open WebUI on Power10 can support advanced AI workflows such as Retrieval-Augmented Generation (RAG) for knowledge-based question answering and agentic AI systems that integrate LLMs with external tools and APIs. These capabilities unlock new possibilities for enterprise AI applications, enhancing automation and decision-making processes.

Conclusion

Combining Ollama for backend model serving and Open WebUI for frontend interaction provides an efficient and cost-effective solution for deploying LLMs on CPUs. When running on IBM Power10, this solution benefits from hardware-accelerated inference, optimized memory bandwidth, and enterprise-grade scalability.

For organizations leveraging Power10 infrastructure, deploying LLMs without GPUs becomes a viable, high-performance alternative. As AI adoption expands, solutions like Ollama and Open WebUI on Power10 highlight the potential of CPU-optimized LLM deployment, making advanced AI more practical and scalable in enterprise computing environments.

0 comments
37 views

Permalink