Granite 4.0 Tiny Preview is extremely compact and compute efficient: at FP8 precision, several concurrent sessions performing long context (128K) tasks can be run on consumer grade hardware, including GPUs commonly available for under $350 USD.1 Though the model is only partially trained—it has only seen 2.5T of a planned 15T or more training tokens—it already offers performance rivaling that of IBM Granite 3.3 2B Instruct despite fewer active parameters and a roughly 72% reduction in memory requirements.2 We anticipate Granite 4.0 Tiny’s performance to be on par with that of Granite 3.3 8B Instruct by the time it has completed training and post-training.
Read the full article.