GPT-5.4 is the best vision-capable OpenAI model, pairing 56.8 intelligence with image and document understanding. GPT-5.5 (56.7) and GPT-5.3-Codex (53.6) round out the top three.
Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.
GPT-5.4 is the best vision-capable OpenAI model, pairing 56.8 intelligence with image and document understanding. GPT-5.5 (56.7) and GPT-5.3-Codex (53.6) round out the top three.
GPT-5.5 (56.7) is the closest alternative on this metric, followed by GPT-5.3-Codex (53.6). See the full ranking above for the tradeoffs.
modelgrep tracks 62 OpenAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GPT-5.4. 25 of them qualify for this ranking.