modelgrep
D

DeepSeek: R1 Distill Qwen 32B

deepseek/deepseek-r1-distill-qwen-32b

ReasoningJSON
Use via OpenRouter ↗
Intelligence
Design Elo
Speed
23
270th fastest
Latency
970ms
first token
Input price
$0.290
127th cheapest
Context
128K
33K max out

How it compares

Faster than13%
of all ranked models
Cheaper than57%
of all ranked models

Overview

DeepSeek R1 Distill Qwen 32B is a distilled large language model based on [Qwen 2.5 32B](https://huggingface.co/Qwen/Qwen2.5-32B), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). It outperforms OpenAI's o1-mini across various benchmarks, achieving new...

Benchmarks

independent · via OpenRouter
Artificial Analysis
GPQA Diamond
62%
Humanity's Last Exam
6%
SciCode
38%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
NextBitfp8$0.290$0.290

Specifications

Context window128K
Max output33K
Knowledge cutoffJul 2024
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

R1 Distill Qwen 32B FAQ

How much does R1 Distill Qwen 32B cost?

R1 Distill Qwen 32B costs $0.290 per million input tokens and $0.290 per million output tokens via OpenRouter, making it 127th cheapest of 298 paid models.

How fast is R1 Distill Qwen 32B?

R1 Distill Qwen 32B generates around 23 tokens per second with 970ms time-to-first-token (p50), the 270th fastest tracked model.

What is R1 Distill Qwen 32B's context window?

R1 Distill Qwen 32B supports a 128K-token context window and can output up to 33K tokens. It accepts text input.

Compare head-to-head