modelgrep

Lowest-Latency Google Models

Quick answer · Updated June 2026

Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.

276msLatency
39.2Intelligence
64 t/sSpeed
FreeInput /M
262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1G
    gemma-4-31b-it:free
    ReasoningToolsJSON+139.2 intel · Free/M · 64 t/s
    276ms
    Latency
  2. 2G
    gemma-4-31b-it
    ReasoningToolsJSON+139.2 intel · $0.120/M · 64 t/s
    276ms
    Latency
  3. 3G
    gemma-3n-e4b-it
    $0.060/M · 29 t/s · 33K ctx
    276ms
    Latency
  4. 4G
    gemini-2.5-flash-lite
    ReasoningToolsJSON+212.7 intel · $0.100/M · 113 t/s
    395ms
    Latency
  5. 5G
    gemma-3-27b-it
    ToolsJSONVision10.3 intel · $0.080/M · 43 t/s
    399ms
    Latency
  6. 6G
    gemma-4-26b-a4b-it:free
    ReasoningToolsJSON+131.2 intel · Free/M · 46 t/s
    447ms
    Latency
  7. 7G
    gemma-4-26b-a4b-it
    ReasoningToolsJSON+131.2 intel · $0.060/M · 46 t/s
    447ms
    Latency
  8. 8G
    gemini-2.5-flash-lite-preview-09-2025
    ReasoningToolsJSON+219.4 intel · $0.100/M · 188 t/s
    514ms
    Latency
  9. 9G
    gemma-3-4b-it
    JSONVision6.3 intel · $0.050/M · 20 t/s
    540ms
    Latency
  10. 10G
    gemini-3.1-flash-lite-preview
    ReasoningToolsJSON+233.5 intel · $0.250/M · 90 t/s
    608ms
    Latency
  11. 11G
    gemma-3-12b-it
    ToolsJSONVision8.8 intel · $0.050/M · 29 t/s
    616ms
    Latency
  12. 12G
    gemini-2.5-flash
    ReasoningToolsJSON+2$0.300/M · 87 t/s · 1.0M ctx
    620ms
    Latency
  13. 13G
    gemini-3.1-flash-lite
    ReasoningToolsJSON+2$0.250/M · 112 t/s · 1.0M ctx
    668ms
    Latency
  14. 14G
    gemma-2-27b-it
    JSON$0.650/M · 44 t/s · 8K ctx
    725ms
    Latency
  15. 15G
    gemini-2.5-pro
    ReasoningToolsJSON+234.6 intel · $1.25/M · 96 t/s
    948ms
    Latency
  16. 16G
    gemini-2.5-pro-preview
    ReasoningToolsJSON+2$1.25/M · 96 t/s · 1.0M ctx
    948ms
    Latency
  17. 17G
    gemini-2.5-pro-preview-05-06
    ReasoningToolsJSON+2$1.25/M · 96 t/s · 1.0M ctx
    948ms
    Latency
  18. 18G
    gemini-3-flash-preview
    ReasoningToolsJSON+246.4 intel · $0.500/M · 68 t/s
    1.3s
    Latency
  19. 19G
    gemini-3.5-flash
    ReasoningToolsJSON+243.3 intel · $1.50/M · 174 t/s
    1.7s
    Latency
  20. 20G
    gemini-3.1-pro-preview
    ReasoningToolsJSON+241.3 intel · $2.00/M · 82 t/s
    3.2s
    Latency
  21. 21G
    lyria-3-clip-preview
    JSONVisionFree/M · 11 t/s · 1.0M ctx
    3.2s
    Latency
  22. 22G
    gemini-3-pro-image-preview
    ReasoningJSONVision+1$2.00/M · 75 t/s · 66K ctx
    3.6s
    Latency
  23. 23G
    gemini-3.1-pro-preview-customtools
    ReasoningToolsJSON+2$2.00/M · 58 t/s · 1.0M ctx
    3.6s
    Latency
  24. 24G
    gemini-2.5-flash-image
    JSONVisionImage out$0.300/M · 247 t/s · 33K ctx
    4.6s
    Latency
  25. 25G
    lyria-3-pro-preview
    JSONVisionFree/M · 1 t/s · 1.0M ctx
    6.2s
    Latency

Frequently asked

Which Google model has the lowest latency?

Gemma 4 31B (free) has the lowest latency of any Google model, responding in about 276ms to first token. Gemma 4 31B (276ms) and Gemma 3n 4B (276ms) round out the top three.

What's a good alternative to Gemma 4 31B (free)?

Gemma 4 31B (276ms) is the closest alternative on this metric, followed by Gemma 3n 4B (276ms). See the full ranking above for the tradeoffs.

How many Google models are there?

modelgrep tracks 26 Google models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Gemini 3 Flash Preview. 25 of them qualify for this ranking.

More Google rankings

All rankings