Hi Ahmed -
It IS possible to run Granite 4.0-Tiny on the desktop CPU/GPU, and one of the easiest ways to start is using Ollama (https://ollama.com/library/granite4:tiny-h). Ollama (and the underlying llama.cpp) will attempt to use the GPU to accelerate inferencing, but it's ability to do so is highly dependent upon the system configuration. Subjectively, on an M4 Macbook Pro, it's quite fast.
------------------------------
Brian Bissell
------------------------------
Original Message:
Sent: Wed October 29, 2025 03:29 AM
From: Ahmed Alsareti
Subject: Running Granite 4.0 Tiny Preview on CPU
Is it possible to deploy the Granite 4.0 Tiny Preview model without GPUs? If so, what are the performance implications and recommended optimizations for CPU-only deployment?
#watsonx.ai
------------------------------
Ahmed Alsareti
------------------------------