modelgrep
Q

Qwen: Qwen3 VL 8B Thinking

qwen/qwen3-vl-8b-thinking

136th smartest of 178Cheaper than 78% of paidReasoningToolsJSONVision
Use via OpenRouter ↗
Intelligence
16.7
136th of 178
Design Elo
Speed
126
41st fastest
Latency
483ms
first token
Input price
$0.117
65th cheapest
Context
256K
33K max out

How it compares

Smarter than24%
of all ranked models
Faster than86%
of all ranked models
Cheaper than78%
of all ranked models

Overview

Qwen3-VL-8B-Thinking is the reasoning-optimized variant of the Qwen3-VL-8B multimodal model, designed for advanced visual and textual reasoning across complex scenes, documents, and temporal sequences. It integrates enhanced multimodal alignment and...

Benchmarks

independent · via OpenRouter
Artificial Analysis24th percentile
Intelligence Index
16.7
Coding Index
9.8
Agentic Index
15.6
GPQA Diamond
58%
Humanity's Last Exam
3%
SciCode
22%
Tau²-Bench (agentic)
23%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
Alibaba$0.117$1.36

Specifications

Context window256K
Max output33K
Knowledge cutoff
Input modalitiesimage, text
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Qwen3 VL 8B Thinking FAQ

How much does Qwen3 VL 8B Thinking cost?

Qwen3 VL 8B Thinking costs $0.117 per million input tokens and $1.36 per million output tokens via OpenRouter, making it 65th cheapest of 298 paid models.

How smart is Qwen3 VL 8B Thinking?

Qwen3 VL 8B Thinking scores 16.7 on the Artificial Analysis Intelligence Index, ranking 136th of 178 benchmarked models, with a GPQA Diamond score of 58%.

How fast is Qwen3 VL 8B Thinking?

Qwen3 VL 8B Thinking generates around 126 tokens per second with 483ms time-to-first-token (p50), the 41st fastest tracked model.

What is Qwen3 VL 8B Thinking's context window?

Qwen3 VL 8B Thinking supports a 256K-token context window and can output up to 33K tokens. It accepts image, text input.

Compare head-to-head