Llama 4 Maverick is the best vision-capable Meta model, pairing 18.4 intelligence with image and document understanding. Llama 4 Scout (13.5) and Llama 3.2 11B Vision Instruct (8.7) round out the top three.
Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.
Llama 4 Maverick is the best vision-capable Meta model, pairing 18.4 intelligence with image and document understanding. Llama 4 Scout (13.5) and Llama 3.2 11B Vision Instruct (8.7) round out the top three.
Llama 4 Scout (13.5) is the closest alternative on this metric, followed by Llama 3.2 11B Vision Instruct (8.7). See the full ranking above for the tradeoffs.
modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 4 of them qualify for this ranking.