Grok 4.3 is the best vision-capable xAI model, pairing 53.2 intelligence with image and document understanding. Grok 4.20 (29.7) and Grok Build 0.1 (—) round out the top three.
Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.
Grok 4.3 is the best vision-capable xAI model, pairing 53.2 intelligence with image and document understanding. Grok 4.20 (29.7) and Grok Build 0.1 (—) round out the top three.
Grok 4.20 (29.7) is the closest alternative on this metric, followed by Grok Build 0.1 (—). See the full ranking above for the tradeoffs.
modelgrep tracks 4 xAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Grok 4.3. 4 of them qualify for this ranking.