Power Data and AI

Power Data and AI

IBM Power systems provide a robust and scalable platform for a wide range of data and AI workloads, offering benefits in performance, security, and ease of use.


#Servers
#Artificialintelligence
#Power
#APIeconomy
#Power


#Power
#TechXchangeConferenceLab
 View Only

Set up and run vLLM on IBM Power

By MARYAM NEZAMABADI posted Mon February 09, 2026 08:04 AM

  

Deploying large language models (LLMs) on IBM Power CPUs can require specific package versions, toolchain configuration, and runtime tuning to achieve reliable, high‑throughput inference—especially when running CPU‑only with bfloat16 (bf16). This blog provides a tested, repeatable setup for vLLM on IBM Power11, covering environment creation, dependency installation (including gcc-toolset-14), build and install steps, execution using fmwork, and what to measure to validate performance.

Scope and outcome

By following the setup, installation, execution steps, and recommended configuration settings described in the blog, you will be able to:

  • Set up a Python environment on Power11 for vLLM with bf16.
  • Install and build the Power‑ready vLLM and dependencies.
  • Run a server/client workflow (via fmwork) to exercise inference.
  • Collect metrics and confirm stable throughput and predictable behavior on Power11.

Prerequisites

Before you begin, ensure the following are in place:

System and OS

  • You must have an IBM Power11 system with a Linux distribution supported on IBM Power (ppc64le), such as Red Hat Enterprise Linux.
  • You must have shell access with permissions to modify environment variables and install Python packages.

Language and tools

  • You must have Python 3.12 available on the system.
  • You must have gcc-toolset-14 installed or accessible because it is the minimum required GCC version on Power11.
  • You must have network access to the IBM Python wheels repository: https://wheels.developerfirst.ibm.com/ppc64le/linux.

Resources

  • You must have adequate disk space to build and install packages and models.
  • You must have stable internet connectivity to clone repositories and download dependencies.

Start by setting up a PyPI environment

Use a dedicated virtual environment to isolate dependencies.

python3.12 -m venv testenv
source testenv/bin/activate && \
        pip install --upgrade pip

Then, create a requirements.txt file and copy the following packages into it. Note that these package versions work well with vLLM 0.11.1build.

Before you proceed, review the following note to understand when you may not need to install every library from the list.

Note: Some dependencies (for example: ffmpeg, libprotobuf, openblas) may already be present as system libraries in certain Power environments. The listed versions reflect a tested configuration using IBM‑provided wheels.

abseil-cpp
argon2-cffi-bindings
cachetools
cffi
cmake
dill
datasets
ffmpeg
grpc-cpp
grpcio==1.76.0
h5py==3.13.0
hdf5
httptools
ibm_db
ipykernel
jedi
libprotobuf
libvpx
MarkupSafe
matplotlib
matplotlib-inline
ml-dtypes
mpmath
msgspec
ninja
numba
numpy
onnxruntime
openblas
opencv-python-headless
opus
outlines_core
pandas
pillow
pip
protobuf
psutil
pyarrow==19.0.0
pydantic
pydantic_core
pydantic-extra-types
PyYAML
Pyzmq
regex
scikit-learn
scipy==1.15.3
sentencepiece
setuptools
setuptools-scm
sklearn-pandas
sympy
termcolor
tiktoken
tokenizers
torch==2.8.0
torchaudio==2.8.0
torchvision==0.23.0
transformers
tzdata
wrapt
yarl

Once you create the requirements.txt file, install the packages using the following command:

pip install --prefer-binary -r requirements.txt --extra-index-url=https://wheels.developerfirst.ibm.com/ppc64le/linux

Update gcc-toolset

To successfully build and run vLLM on IBM Power11, you need an updated GCC toolchain because certain dependencies require a modern compiler. The recommended version for Power11 is gcc-toolset-14. This step ensures that your environment uses the correct compiler before proceeding with installation.

After upgrading, verify that your PATH points to the new GCC version 14.

scl enable gcc-toolset-14 bash 
source scl_source enable gcc-toolset-14
 export PATH=/opt/rh/gcc-toolset-14/root/usr/bin/:$PATH

Set the environment variables

To ensure that libraries and build tools are correctly located during runtime and compilation, you need to configure several environment variables. These variables define paths for Python packages, shared libraries, and compiler settings, as well as vLLM-specific tuning parameters.

Set the following environment variables:

Important

  • Set LD_LIBRARY_PATH entries only if your environment does not already provide these libraries system‑wide; over‑specifying library paths can lead to application binary interface (ABI) conflicts or degraded performance.
  • Adjust SITE_PACKAGE_PATH if your virtual environment uses lib instead of lib64.
export SITE_PACKAGE_PATH=$VIRTUAL_ENV/lib64/python3.12/site-packages

export LD_LIBRARY_PATH="$LD_LIBRARY_PATH:$SITE_PACKAGE_PATH/libprotobuf/lib64/:$SITE_PACKAGE_PATH/openblas/lib/:$SITE_PACKAGE_PATH/:$SITE_PACKAGE_PATH/ffmpeg/lib/:$SITE_PACKAGE_PATH/libvpx/lib/:$SITE_PACKAGE_PATH/lame/lib/"

export CMAKE_PREFIX_PATH=$SITE_PACKAGE_PATH/libprotobuf:$CMAKE_PREFIX_PATH

export CC=/opt/rh/gcc-toolset-14/root/usr/bin/gcc

export CXX=/opt/rh/gcc-toolset-14/root/usr/bin/g++

export VLLM_CPU_KVCACHE_SPACE=40

export VLLM_CPU_OMP_THREADS_BIND="auto"

Install vLLM

Once the environment is prepared and the required toolchain is in place, the next step is to install vLLM. This involves cloning the vLLM repository, installing its dependencies, and building it for CPU execution. Use the following commands to complete the installation:

git clone https://github.com/vllm-project/vllm.git
cd vllm 
pip install -r requirements/common.txt 
VLLM_TARGET_DEVICE=cpu python3 setup.py install

An example code for running vLLM (using fmwork)

The following example demonstrates server and client parameters, tensor parallel size, dtype, and sequence lengths.

fmwork is used here to orchestrate server/client execution and load generation.

WORKSPACE="/home/user/fmwork/infer/vllm"
MODEL_ROOT="/home/user"
MODEL_NAME="granite-3.3-8b-instruct"

# --- Execution ---

./runner \
    --dir_work "$WORKSPACE" \
    --mode server \
    --model_root "$MODEL_ROOT" \
    --model_name "$MODEL_NAME" \
    -- \
server \
    --env PYTHONUNBUFFERED=1 \
    --env VLLM_USE_V1=1 \
    --tensor-parallel-size 1 \
    --max-num-seqs 16 \
    --dtype bfloat16 \
    --max-model-len 8192 \
    --max-num-batched-tokens 32768 \
    -- \
client \
    --env PYTHONUNBUFFERED=1 \
    --dataset-name random \
    --random-input-len 2048 \
    --random-output-len 1024 \
    --num-prompts 1 \
    --max-concurrency 1 \
    --ignore-eos

What to record per run

To evaluate performance and ensure reproducibility, it is important to capture key configuration details and metrics for each run. Recording this information will help you compare different setups, identify bottlenecks, and validate tuning changes. Use the following checklist:

  • Model and commit: example: Granite-3.3-8B-Instruct, vLLM 0.11.1
  • dtype: bf16
  • Threading: VLLM_CPU_OMP_THREADS_BIND, SMT level
  • Load shape: input/output token lengths, concurrency, batch limits
  • Metrics: TTFT, ITL, Throughput (output tokens)
  • From nmon: CPU util, context switches, miss rate, average run-queue

Troubleshooting

When working through the installation and configuration steps, you may encounter issues related to toolchain paths, library dependencies, or version mismatches. This section provides quick checks and corrective actions for common problems, helping you resolve errors efficiently and continue with the setup.

Use the following checks if you encounter any errors:

  • Toolchain/path issues
    Symptom: Build fails or compilers not found
    Fix:
    gcc --version
    which gcc
    echo $PATH
    Ensure that PATH includes /opt/rh/gcc-toolset-14/root/usr/bin/.
  • Library resolution issues
    Symptom: Runtime errors about missing directories libprotobuf, openblas, ffmpeg, or libvpx
    Fix:
    echo $LD_LIBRARY_PATH
    Confirm paths include the exported directories. Then, source the virtual environment again and export the variables if needed.
  • Python environment conflicts
    Symptom: Version mismatches or pip install failures
    Fix:
    python3.12 -m venv testenv
    source testenv/bin/activate
    pip install --upgrade pip
    Recreate the venv and reinstall from requirements.txt.
  • vLLM or Torch version mismatches
    Symptom: Import errors or API incompatibility
    Fix:
    python -c "import torch, sys; print('Torch:', torch.__version__)"
    python -c "import vllm; print('vLLM imported OK')"
    Verify Torch 2.8.0 and the installed vLLM build.
  • Runtime configuration issues
    Symptom: Poor throughput or unstable performance
    Fix: Adjust VLLM_CPU_OMP_THREADS_BIND and SMT settings; re‑run and compare TTFT/ITL/throughput.

Validation and benchmarking

After completing the installation and running vLLM, it is essential to validate that the setup works as expected and measure the performance. This section outlines key checks, metrics to capture, and commands to confirm reproducibility and benchmark throughput on Power11.

After you run the server and client:

  • Check that the server started successfully. Make sure there are no errors in the logs.
  • Verify the configuration. Confirm that the data type (dtype) and sequence lengths match what you intended.
  • Measure the performance. Record the following key metrics:
    • TTFT (Time to First Token) – how quickly the first token is generated
    • ITL (Inter-Token Latency) – time between tokens
    • Throughput – number of output tokens per second
  • Monitor the system usage (optional). Use tools like nmon to check:
    • CPU utilization
    • Context switches
    • Cache miss rate
    • Average run queue
  • Save the environment details for reproducibility. Run the following commands and keep the output for future reference:
    python --version
    pip freeze | sort
    echo $VLLM_CPU_OMP_THREADS_BIND

Summary

Running vLLM on IBM Power11 with bf16 is achievable with a repeatable setup that covers environment preparation, toolchain alignment, and CPU‑focused build steps. With the recommended configuration and tuning, you can obtain predictable behavior and measure performance consistently across runs.

Additionally, keep the following critical points in view for a smooth and effective setup:

  • Proven setup path on Power11 using Python 3.12 and gcc-toolset-14, ensuring compiler compatibility for the build.
  • Executable workflow to install dependencies, build vLLM for CPU, and run a server/client sequence with fmwork.
  • Practical tuning controls—notably VLLM_CPU_OMP_THREADS_BIND and SMT settings—to stabilize performance and improve throughput.
  • Focused validation guidance (TTFT, ITL, throughput) and optional system metrics to benchmark and compare configurations reliably.
  • Targeted troubleshooting to resolve common toolchain, library path, and version issues quickly.

References

0 comments
21 views

Permalink