qwen/qwen3-vl-32b-instruct
Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| Alibaba | $0.104 | $0.416 | 131K | 100% |
Qwen3 VL 32B Instruct costs $0.104 per million input tokens and $0.416 per million output tokens via OpenRouter, making it 63rd cheapest of 298 paid models.
Qwen3 VL 32B Instruct scores 24.7 on the Artificial Analysis Intelligence Index, ranking 100th of 178 benchmarked models, with a GPQA Diamond score of 73%.
Qwen3 VL 32B Instruct generates around 61 tokens per second with 699ms time-to-first-token (p50), the 137th fastest tracked model.
Qwen3 VL 32B Instruct supports a 262K-token context window and can output up to 33K tokens. It accepts text, image input.