modelgrep
Q

Qwen: Qwen3 VL 30B A3B Instruct

qwen/qwen3-vl-30b-a3b-instruct

138th smartest of 179Cheaper than 77% of paidToolsJSONVision
Use via OpenRouter ↗
Intelligence
16.0
138th of 179
Design Elo
Speed
46
180th fastest
Latency
361ms
first token
Input price
$0.130
70th cheapest
Context
262K
33K max out

How it compares

Smarter than23%
of all ranked models
Faster than39%
of all ranked models
Cheaper than77%
of all ranked models

Overview

Qwen3-VL-30B-A3B-Instruct is a multimodal model that unifies strong text generation with visual understanding for images and videos. Its Instruct variant optimizes instruction-following for general multimodal tasks. It excels in perception...

Benchmarks

independent · via OpenRouter
Artificial Analysis22th percentile
Intelligence Index
16.0
Coding Index
14.3
Agentic Index
9.5
GPQA Diamond
70%
Humanity's Last Exam
6%
SciCode
31%
Tau²-Bench (agentic)
19%

Providers & pricing (6)

ProviderIn $/MOut $/MUptime
Alibaba$0.130$0.520100%
AtlasCloudfp8$0.150$0.60098.7%
DeepInfrafp8$0.150$0.600100%
Novitabf16$0.200$0.70097.1%
Phala$0.200$0.70094.9%
SiliconFlowfp8$0.290$1.0096.6%

Specifications

Context window262K
Max output33K
Knowledge cutoffMar 2025
Input modalitiestext, image
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Qwen3 VL 30B A3B Instruct FAQ

How much does Qwen3 VL 30B A3B Instruct cost?

Qwen3 VL 30B A3B Instruct costs $0.130 per million input tokens and $0.520 per million output tokens via OpenRouter, making it 70th cheapest of 298 paid models.

How smart is Qwen3 VL 30B A3B Instruct?

Qwen3 VL 30B A3B Instruct scores 16.0 on the Artificial Analysis Intelligence Index, ranking 138th of 179 benchmarked models, with a GPQA Diamond score of 70%.

How fast is Qwen3 VL 30B A3B Instruct?

Qwen3 VL 30B A3B Instruct generates around 46 tokens per second with 361ms time-to-first-token (p50), the 180th fastest tracked model.

What is Qwen3 VL 30B A3B Instruct's context window?

Qwen3 VL 30B A3B Instruct supports a 262K-token context window and can output up to 33K tokens. It accepts text, image input.

Compare head-to-head