modelgrep
Q

Qwen: Qwen3 32B

qwen/qwen3-32b

Cheaper than 87% of paidReasoningToolsJSON
Use via OpenRouter ↗
Intelligence
Design Elo
Speed
361
9th fastest
Latency
322ms
first token
Input price
$0.080
40th cheapest
Context
131K
16K max out

How it compares

Faster than97%
of all ranked models
Cheaper than87%
of all ranked models

Overview

Qwen3-32B is a dense 32.8B parameter causal language model from the Qwen3 series, optimized for both complex reasoning and efficient dialogue. It supports seamless switching between a "thinking" mode for...

Benchmarks

independent · via OpenRouter
Artificial Analysis
GPQA Diamond
54%
Humanity's Last Exam
4%
SciCode
28%

Providers & pricing (6)

ProviderIn $/MOut $/MUptime
DeepInfrafp8$0.080$0.28099.9%
Nebiusfp8$0.100$0.30099.6%
AtlasCloudfp8$0.100$1.2099.4%
Alibaba$0.104$0.41699.3%
SiliconFlowfp8$0.140$0.57098.5%
Groq$0.290$0.59099.7%

Specifications

Context window131K
Max output16K
Knowledge cutoffMar 2025
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo
Open weightsQwen/Qwen3-32B

Qwen3 32B FAQ

How much does Qwen3 32B cost?

Qwen3 32B costs $0.080 per million input tokens and $0.280 per million output tokens via OpenRouter, making it 40th cheapest of 298 paid models.

How fast is Qwen3 32B?

Qwen3 32B generates around 361 tokens per second with 322ms time-to-first-token (p50), the 9th fastest tracked model.

What is Qwen3 32B's context window?

Qwen3 32B supports a 131K-token context window and can output up to 16K tokens. It accepts text input.

Compare head-to-head