modelgrep

Best Baidu Vision Models

Quick answer · Updated June 2026

ERNIE 4.5 VL 424B A47B is the best vision-capable Baidu model, pairing — intelligence with image and document understanding.

Intelligence
30 t/sSpeed
$0.420Input /M
131KContext

Multimodal large language models that accept image input, ranked by intelligence. The best vision-capable AI models for understanding images, documents and charts.

  1. 1B
    ernie-4.5-vl-424b-a47b
    ReasoningVision$0.420/M · 30 t/s · 1.4s ttft
    Intelligence

Frequently asked

What is the best Baidu model for vision?

ERNIE 4.5 VL 424B A47B is the best vision-capable Baidu model, pairing — intelligence with image and document understanding.

How many Baidu models are there?

modelgrep tracks 1 Baidu models with live benchmarks, speed, latency and per-provider pricing. 1 of them qualify for this ranking.

More Baidu rankings

All rankings