modelgrep

Best Meta Vision Models

Quick answer · Updated June 2026

Llama 4 Maverick is the best vision-capable Meta model, pairing 18.4 intelligence with image and document understanding. Llama 4 Scout (13.5) and Llama 3.2 11B Vision Instruct (8.7) round out the top three.

18.4Intelligence
72 t/sSpeed
$0.150Input /M
1.0MContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.

  1. 1M
    llama-4-maverick
    ToolsJSONVision18.4 intel · $0.150/M · 72 t/s
    18.4
    Intelligence
  2. 2M
    llama-4-scout
    ToolsJSONVision13.5 intel · $0.100/M · 130 t/s
    13.5
    Intelligence
  3. 3M
    llama-3.2-11b-vision-instruct
    JSONVision8.7 intel · $0.345/M · 35 t/s
    8.7
    Intelligence
  4. 4M
    llama-guard-4-12b
    JSONVision$0.180/M · 18 t/s · 120ms ttft
    Intelligence

Frequently asked

What is the best Meta model for vision?

Llama 4 Maverick is the best vision-capable Meta model, pairing 18.4 intelligence with image and document understanding. Llama 4 Scout (13.5) and Llama 3.2 11B Vision Instruct (8.7) round out the top three.

What's a good alternative to Llama 4 Maverick?

Llama 4 Scout (13.5) is the closest alternative on this metric, followed by Llama 3.2 11B Vision Instruct (8.7). See the full ranking above for the tradeoffs.

How many Meta models are there?

modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 4 of them qualify for this ranking.

More Meta rankings

All rankings