modelgrep

Best Google Vision Models

Quick answer · Updated June 2026

Gemini 3 Flash Preview is the best vision-capable Google model, pairing 46.4 intelligence with image and document understanding. Gemini 3.5 Flash (43.3) and Gemini 3.1 Pro Preview (41.3) round out the top three.

46.4Intelligence
72 t/sSpeed
$0.500Input /M
1.0MContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.

  1. 1G
    gemini-3-flash-preview
    ReasoningToolsJSON+246.4 intel · $0.500/M · 72 t/s
    46.4
    Intelligence
  2. 2G
    gemini-3.5-flash
    ReasoningToolsJSON+243.3 intel · $1.50/M · 151 t/s
    43.3
    Intelligence
  3. 3G
    gemini-3.1-pro-preview
    ReasoningToolsJSON+241.3 intel · $2.00/M · 91 t/s
    41.3
    Intelligence
  4. 4G
    gemma-4-31b-it:free
    ReasoningToolsJSON+139.2 intel · Free/M · 65 t/s
    39.2
    Intelligence
  5. 5G
    gemma-4-31b-it
    ReasoningToolsJSON+139.2 intel · $0.120/M · 65 t/s
    39.2
    Intelligence
  6. 6G
    gemini-2.5-pro
    ReasoningToolsJSON+234.6 intel · $1.25/M · 97 t/s
    34.6
    Intelligence
  7. 7G
    gemini-3.1-flash-lite-preview
    ReasoningToolsJSON+233.5 intel · $0.250/M · 95 t/s
    33.5
    Intelligence
  8. 8G
    gemma-4-26b-a4b-it:free
    ReasoningToolsJSON+131.2 intel · Free/M · 52 t/s
    31.2
    Intelligence
  9. 9G
    gemma-4-26b-a4b-it
    ReasoningToolsJSON+131.2 intel · $0.060/M · 50 t/s
    31.2
    Intelligence
  10. 10G
    gemini-2.5-flash-lite-preview-09-2025
    ReasoningToolsJSON+219.4 intel · $0.100/M · 211 t/s
    19.4
    Intelligence
  11. 11G
    gemini-2.5-flash-lite
    ReasoningToolsJSON+217.6 intel · $0.100/M · 1.0M ctx
    17.6
    Intelligence
  12. 12G
    gemma-3-27b-it
    ToolsJSONVision10.3 intel · $0.080/M · 45 t/s
    10.3
    Intelligence
  13. 13G
    gemma-3-12b-it
    ToolsJSONVision8.8 intel · $0.050/M · 35 t/s
    8.8
    Intelligence
  14. 14G
    gemma-3-4b-it
    JSONVision6.3 intel · $0.050/M · 17 t/s
    6.3
    Intelligence
  15. 15G
    gemini-3.1-flash-lite
    ReasoningToolsJSON+2$0.250/M · 104 t/s · 587ms ttft
    Intelligence
  16. 16G
    lyria-3-pro-preview
    JSONVisionFree/M · 2 t/s · 5.5s ttft
    Intelligence
  17. 17G
    lyria-3-clip-preview
    JSONVisionFree/M · 1.0M ctx
    Intelligence
  18. 18G
    gemini-3.1-flash-image-preview
    ReasoningJSONVision+1$0.500/M · 139 t/s · 9.8s ttft
    Intelligence
  19. 19G
    gemini-3.1-pro-preview-customtools
    ReasoningToolsJSON+2$2.00/M · 70 t/s · 3.1s ttft
    Intelligence
  20. 20G
    gemini-3-pro-image-preview
    ReasoningJSONVision+1$2.00/M · 81 t/s · 3.3s ttft
    Intelligence
  21. 21G
    gemini-2.5-flash-image
    JSONVisionImage out$0.300/M · 171 t/s · 2.4s ttft
    Intelligence
  22. 22G
    gemini-2.5-flash
    ReasoningToolsJSON+2$0.300/M · 79 t/s · 677ms ttft
    Intelligence
  23. 23G
    gemini-2.5-pro-preview
    ReasoningToolsJSON+2$1.25/M · 97 t/s · 1.1s ttft
    Intelligence
  24. 24G
    gemini-2.5-pro-preview-05-06
    ReasoningToolsJSON+2$1.25/M · 97 t/s · 1.1s ttft
    Intelligence

Frequently asked

What is the best Google model for vision?

Gemini 3 Flash Preview is the best vision-capable Google model, pairing 46.4 intelligence with image and document understanding. Gemini 3.5 Flash (43.3) and Gemini 3.1 Pro Preview (41.3) round out the top three.

What's a good alternative to Gemini 3 Flash Preview?

Gemini 3.5 Flash (43.3) is the closest alternative on this metric, followed by Gemini 3.1 Pro Preview (41.3). See the full ranking above for the tradeoffs.

How many Google models are there?

modelgrep tracks 26 Google models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Gemini 3 Flash Preview. 24 of them qualify for this ranking.

More Google rankings

All rankings