modelgrep

Lowest-Latency IBM Models

Quick answer · Updated June 2026

Granite 4.1 8B has the lowest latency of any IBM model, responding in about 144ms to first token. Granite 4.0 Micro (301ms) is next.

144msLatency
12.4Intelligence
118 t/sSpeed
$0.050Input /M
131KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1I
    granite-4.1-8b
    ToolsJSON12.4 intel · $0.050/M · 118 t/s
    144ms
    Latency
  2. 2I
    granite-4.0-h-micro
    7.7 intel · $0.017/M · 27 t/s
    301ms
    Latency

Frequently asked

Which IBM model has the lowest latency?

Granite 4.1 8B has the lowest latency of any IBM model, responding in about 144ms to first token. Granite 4.0 Micro (301ms) is next.

What's a good alternative to Granite 4.1 8B?

Granite 4.0 Micro (301ms) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many IBM models are there?

modelgrep tracks 2 IBM models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Granite 4.1 8B. 2 of them qualify for this ranking.

More IBM rankings

All rankings