Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.
Gemma 4 31B (276ms) is the closest alternative on this metric, followed by Gemma 3n 4B (276ms). See the full ranking above for the tradeoffs.
modelgrep tracks 26 Google models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Gemini 3 Flash Preview. 25 of them qualify for this ranking.