Lowest-Latency Google Models

Quick answer · Updated June 2026

Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.

276msLatency

39.2Intelligence

64 t/sSpeed

FreeInput /M

262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

1G
gemma-4-31b-it:free
ReasoningToolsJSON+139.2 intel · Free/M · 64 t/s
276ms
Latency
2G
gemma-4-31b-it
ReasoningToolsJSON+139.2 intel · $0.120/M · 64 t/s
276ms
Latency
3G
gemma-3n-e4b-it
$0.060/M · 29 t/s · 33K ctx
276ms
Latency
4G
gemini-2.5-flash-lite
ReasoningToolsJSON+212.7 intel · $0.100/M · 113 t/s
395ms
Latency
5G
gemma-3-27b-it
ToolsJSONVision10.3 intel · $0.080/M · 43 t/s
399ms
Latency
6G
gemma-4-26b-a4b-it:free
ReasoningToolsJSON+131.2 intel · Free/M · 46 t/s
447ms
Latency
7G
gemma-4-26b-a4b-it
ReasoningToolsJSON+131.2 intel · $0.060/M · 46 t/s
447ms
Latency
8G
gemini-2.5-flash-lite-preview-09-2025
ReasoningToolsJSON+219.4 intel · $0.100/M · 188 t/s
514ms
Latency
9G
gemma-3-4b-it
JSONVision6.3 intel · $0.050/M · 20 t/s
540ms
Latency
10G
gemini-3.1-flash-lite-preview
ReasoningToolsJSON+233.5 intel · $0.250/M · 90 t/s
608ms
Latency
11G
gemma-3-12b-it
ToolsJSONVision8.8 intel · $0.050/M · 29 t/s
616ms
Latency
12G
gemini-2.5-flash
ReasoningToolsJSON+2$0.300/M · 87 t/s · 1.0M ctx
620ms
Latency
13G
gemini-3.1-flash-lite
ReasoningToolsJSON+2$0.250/M · 112 t/s · 1.0M ctx
668ms
Latency
14G
gemma-2-27b-it
JSON$0.650/M · 44 t/s · 8K ctx
725ms
Latency
15G
gemini-2.5-pro
ReasoningToolsJSON+234.6 intel · $1.25/M · 96 t/s
948ms
Latency
16G
gemini-2.5-pro-preview
ReasoningToolsJSON+2$1.25/M · 96 t/s · 1.0M ctx
948ms
Latency
17G
gemini-2.5-pro-preview-05-06
ReasoningToolsJSON+2$1.25/M · 96 t/s · 1.0M ctx
948ms
Latency
18G
gemini-3-flash-preview
ReasoningToolsJSON+246.4 intel · $0.500/M · 68 t/s
1.3s
Latency
19G
gemini-3.5-flash
ReasoningToolsJSON+243.3 intel · $1.50/M · 174 t/s
1.7s
Latency
20G
gemini-3.1-pro-preview
ReasoningToolsJSON+241.3 intel · $2.00/M · 82 t/s
3.2s
Latency
21G
lyria-3-clip-preview
JSONVisionFree/M · 11 t/s · 1.0M ctx
3.2s
Latency
22G
gemini-3-pro-image-preview
ReasoningJSONVision+1$2.00/M · 75 t/s · 66K ctx
3.6s
Latency
23G
gemini-3.1-pro-preview-customtools
ReasoningToolsJSON+2$2.00/M · 58 t/s · 1.0M ctx
3.6s
Latency
24G
gemini-2.5-flash-image
JSONVisionImage out$0.300/M · 247 t/s · 33K ctx
4.6s
Latency
25G
lyria-3-pro-preview
JSONVisionFree/M · 1 t/s · 1.0M ctx
6.2s
Latency

Frequently asked

Which Google model has the lowest latency?

Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.

What's a good alternative to Gemma 4 31B (free)?

Gemma 4 31B (276ms) is the closest alternative on this metric, followed by Gemma 3n 4B (276ms). See the full ranking above for the tradeoffs.

How many Google models are there?

modelgrep tracks 26 Google models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Gemini 3 Flash Preview. 25 of them qualify for this ranking.

More Google rankings

Google: Smartest LLMs Google: Best LLMs for Coding Google: Best LLMs for Design & Frontend Google: Fastest LLMs Google: Cheapest LLMs Google: Best Free LLMs Google: Best Reasoning LLMs Google: Best Vision LLMs Google: Best LLMs for Agents Google: Best Open-Source LLMs Google: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs