Lowest-Latency IBM Models

Quick answer · Updated June 2026

Granite 4.1 8B has the lowest latency of any IBM model, responding in about 144ms to first token. Granite 4.0 Micro (301ms) is next.

144msLatency

12.4Intelligence

118 t/sSpeed

$0.050Input /M

131KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

Frequently asked

Which IBM model has the lowest latency?

Granite 4.1 8B has the lowest latency of any IBM model, responding in about 144ms to first token. Granite 4.0 Micro (301ms) is next.

What's a good alternative to Granite 4.1 8B?

Granite 4.0 Micro (301ms) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many IBM models are there?

modelgrep tracks 2 IBM models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Granite 4.1 8B. 2 of them qualify for this ranking.

More IBM rankings

IBM: Smartest LLMs IBM: Best LLMs for Coding IBM: Best LLMs for Design & Frontend IBM: Fastest LLMs IBM: Cheapest LLMs IBM: Best Free LLMs IBM: Best Reasoning LLMs IBM: Best Vision LLMs IBM: Best LLMs for Agents IBM: Best Open-Source LLMs IBM: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs