Best MiniMax Vision Models

Match · Updated July 2026

MiniMax M3 is the best vision-capable MiniMax model, pairing 44.4 intelligence with image and document understanding. MiniMax M3 (batch) (44.4) and MiniMax-01 (—) round out the top three.

44.4Intelligence

$0.300Input /M

1.0MContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision language models (VLMs) for understanding images, documents and charts.

Frequently asked

What is the best MiniMax model for vision?

MiniMax M3 is the best vision-capable MiniMax model, pairing 44.4 intelligence with image and document understanding. MiniMax M3 (batch) (44.4) and MiniMax-01 (—) round out the top three.

What's a good alternative to MiniMax M3?

MiniMax M3 (batch) (44.4) is the closest alternative on this metric, followed by MiniMax-01 (—). See the full ranking above for the tradeoffs.

How many MiniMax models are there?

modelgrep tracks 9 MiniMax models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by MiniMax M3. 3 of them qualify for this ranking.

More MiniMax rankings

All rankings

Small & Fast LLMs Best Local LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs Best LLMs for Writing Best LLMs for Math & Science Best LLMs for RAG Best LLMs for SQL & Data Analysis Best LLMs for Roleplay Best Uncensored LLMs