modelgrep
Q

Qwen: Qwen3 VL 32B Instruct

qwen/qwen3-vl-32b-instruct

100th smartest of 178Cheaper than 79% of paidToolsJSONVision
Use via OpenRouter ↗
Intelligence
24.7
100th of 178
Design Elo
Speed
61
137th fastest
Latency
699ms
first token
Input price
$0.104
63rd cheapest
Context
262K
33K max out

How it compares

Smarter than44%
of all ranked models
Faster than54%
of all ranked models
Cheaper than79%
of all ranked models

Overview

Qwen3-VL-32B-Instruct is a large-scale multimodal vision-language model designed for high-precision understanding and reasoning across text, images, and video. With 32 billion parameters, it combines deep visual perception with advanced text...

Benchmarks

independent · via OpenRouter
Artificial Analysis42th percentile
Intelligence Index
24.7
Coding Index
14.5
Agentic Index
23.4
GPQA Diamond
73%
Humanity's Last Exam
10%
SciCode
28%
Tau²-Bench (agentic)
46%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
Alibaba$0.104$0.416100%

Specifications

Context window262K
Max output33K
Knowledge cutoff
Input modalitiestext, image
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Qwen3 VL 32B Instruct FAQ

How much does Qwen3 VL 32B Instruct cost?

Qwen3 VL 32B Instruct costs $0.104 per million input tokens and $0.416 per million output tokens via OpenRouter, making it 63rd cheapest of 298 paid models.

How smart is Qwen3 VL 32B Instruct?

Qwen3 VL 32B Instruct scores 24.7 on the Artificial Analysis Intelligence Index, ranking 100th of 178 benchmarked models, with a GPQA Diamond score of 73%.

How fast is Qwen3 VL 32B Instruct?

Qwen3 VL 32B Instruct generates around 61 tokens per second with 699ms time-to-first-token (p50), the 137th fastest tracked model.

What is Qwen3 VL 32B Instruct's context window?

Qwen3 VL 32B Instruct supports a 262K-token context window and can output up to 33K tokens. It accepts text, image input.

Compare head-to-head