Best Meta Vision Models

Match · Updated July 2026

Llama 4 Maverick is the best vision-capable Meta model, pairing 14.3 intelligence with image and document understanding. Llama 4 Scout (10.0) and Llama Guard 4 12B (—) round out the top three.

14.3Intelligence

$0.200Input /M

1.0MContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision language models (VLMs) for understanding images, documents and charts.

Frequently asked

What is the best Meta model for vision?

Llama 4 Maverick is the best vision-capable Meta model, pairing 14.3 intelligence with image and document understanding. Llama 4 Scout (10.0) and Llama Guard 4 12B (—) round out the top three.

What's a good alternative to Llama 4 Maverick?

Llama 4 Scout (10.0) is the closest alternative on this metric, followed by Llama Guard 4 12B (—). See the full ranking above for the tradeoffs.

How many Meta models are there?

modelgrep tracks 8 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 3 of them qualify for this ranking.

More Meta rankings

Meta: Smartest LLMs Meta: Best LLMs for Coding Meta: Best LLMs for Design & Frontend Meta: Fastest LLMs Meta: Lowest-Latency LLMs Meta: Cheapest LLMs Meta: Best Free LLMs Meta: Best Reasoning LLMs Meta: Best LLMs for Agents Meta: Best Open-Source LLMs Meta: Longest-Context LLMs Meta: Best LLMs for Writing Meta: Best LLMs for Math & Science Meta: Best LLMs for RAG Meta: Best LLMs for SQL & Data Analysis Meta: Best LLMs for Roleplay Meta: Best Uncensored LLMs

All rankings

Small & Fast LLMs Best Local LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs Best LLMs for Writing Best LLMs for Math & Science Best LLMs for RAG Best LLMs for SQL & Data Analysis Best LLMs for Roleplay Best Uncensored LLMs