modelgrep

Best xAI Vision Models

Quick answer · Updated June 2026

Grok 4.3 is the best vision-capable xAI model, pairing 53.2 intelligence with image and document understanding. Grok 4.20 (29.7) and Grok Build 0.1 (—) round out the top three.

53.2Intelligence
127 t/sSpeed
$1.25Input /M
1MContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.

  1. 1X
    grok-4.3
    ReasoningToolsJSON+153.2 intel · $1.25/M · 127 t/s
    53.2
    Intelligence
  2. 2X
    grok-4.20
    ReasoningToolsJSON+129.7 intel · $1.25/M · 78 t/s
    29.7
    Intelligence
  3. 3X
    grok-build-0.1
    ReasoningToolsJSON+1$1.00/M · 123 t/s · 1.1s ttft
    Intelligence
  4. 4X
    grok-4.20-multi-agent
    ReasoningJSONVision$1.25/M · 336 t/s · 11.0s ttft
    Intelligence

Frequently asked

What is the best xAI model for vision?

Grok 4.3 is the best vision-capable xAI model, pairing 53.2 intelligence with image and document understanding. Grok 4.20 (29.7) and Grok Build 0.1 (—) round out the top three.

What's a good alternative to Grok 4.3?

Grok 4.20 (29.7) is the closest alternative on this metric, followed by Grok Build 0.1 (—). See the full ranking above for the tradeoffs.

How many xAI models are there?

modelgrep tracks 4 xAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Grok 4.3. 4 of them qualify for this ranking.

More xAI rankings

All rankings