modelgrep

Small & Fast Google Models

Quick answer · Updated June 2026

The small, fast Google model is Nano Banana (Gemini 2.5 Flash Image) — the efficient tier at 224 tokens/sec and $0.300 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. Gemini 2.5 Flash Lite Preview 09-2025 (217 t/s) and Gemini 3.5 Flash (156 t/s) round out the top three.

224 t/sSpeed
$0.300Input /M
33KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

  1. 1G
    gemini-2.5-flash-image
    JSONVisionImage out$0.300/M · 894ms ttft · 33K ctx
    224 t/s
    Speed
  2. 2G
    gemini-2.5-flash-lite-preview-09-2025
    ReasoningToolsJSON+219.4 intel · $0.100/M · 385ms ttft
    217 t/s
    Speed
  3. 3G
    gemini-3.5-flash
    ReasoningToolsJSON+243.3 intel · $1.50/M · 1.7s ttft
    156 t/s
    Speed
  4. 4G
    gemini-3.1-flash-image-preview
    ReasoningJSONVision+1$0.500/M · 9.7s ttft · 131K ctx
    144 t/s
    Speed
  5. 5G
    gemini-2.5-flash-lite
    ReasoningToolsJSON+217.6 intel · $0.100/M · 369ms ttft
    118 t/s
    Speed
  6. 6G
    gemini-3.1-flash-lite
    ReasoningToolsJSON+2$0.250/M · 611ms ttft · 1.0M ctx
    116 t/s
    Speed
  7. 7G
    gemini-3.1-flash-lite-preview
    ReasoningToolsJSON+233.5 intel · $0.250/M · 624ms ttft
    96 t/s
    Speed
  8. 8G
    gemini-2.5-flash
    ReasoningToolsJSON+2$0.300/M · 601ms ttft · 1.0M ctx
    91 t/s
    Speed
  9. 9G
    gemini-3-pro-image-preview
    ReasoningJSONVision+1$2.00/M · 3.5s ttft · 66K ctx
    79 t/s
    Speed
  10. 10G
    gemma-4-26b-a4b-it:free
    ReasoningToolsJSON+131.2 intel · Free/M · 356ms ttft
    68 t/s
    Speed
  11. 11G
    gemma-4-26b-a4b-it
    ReasoningToolsJSON+131.2 intel · $0.060/M · 356ms ttft
    68 t/s
    Speed
  12. 12G
    gemini-3-flash-preview
    ReasoningToolsJSON+246.4 intel · $0.500/M · 1.2s ttft
    66 t/s
    Speed
  13. 13G
    gemma-4-31b-it:free
    ReasoningToolsJSON+139.2 intel · Free/M · 309ms ttft
    55 t/s
    Speed
  14. 14G
    gemma-4-31b-it
    ReasoningToolsJSON+139.2 intel · $0.120/M · 309ms ttft
    55 t/s
    Speed
  15. 15G
    gemma-3-12b-it
    ToolsJSONVision8.8 intel · $0.050/M · 501ms ttft
    37 t/s
    Speed
  16. 16G
    gemma-3n-e4b-it
    $0.060/M · 268ms ttft · 33K ctx
    35 t/s
    Speed
  17. 17G
    gemma-3-4b-it
    JSONVision6.3 intel · $0.050/M · 566ms ttft
    20 t/s
    Speed

Frequently asked

What is the smallest, fastest Google model?

The small, fast Google model is Nano Banana (Gemini 2.5 Flash Image) — the efficient tier at 224 tokens/sec and $0.300 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. Gemini 2.5 Flash Lite Preview 09-2025 (217 t/s) and Gemini 3.5 Flash (156 t/s) round out the top three.

What's a good alternative to Nano Banana (Gemini 2.5 Flash Image)?

Gemini 2.5 Flash Lite Preview 09-2025 (217 t/s) is the closest alternative on this metric, followed by Gemini 3.5 Flash (156 t/s). See the full ranking above for the tradeoffs.

How many Google models are there?

modelgrep tracks 26 Google models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Gemini 3 Flash Preview. 17 of them qualify for this ranking.

More Google rankings

All rankings