Fastest Qwen Models

Quick answer · Updated June 2026

The fastest Qwen model is Qwen3 32B at 328 output tokens per second. Qwen3.6 35B A3B (172 t/s) and Qwen3 Next 80B A3B Thinking (172 t/s) round out the top three.

328 t/sSpeed

$0.080Input /M

131KContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

1Q
qwen3-32b
ReasoningToolsJSON$0.080/M · 321ms ttft · 131K ctx
328 t/s
Speed
2Q
qwen3.6-35b-a3b
ReasoningToolsJSON+131.5 intel · $0.150/M · 180ms ttft
172 t/s
Speed
3Q
qwen3-next-80b-a3b-thinking
ReasoningToolsJSON26.7 intel · $0.098/M · 252ms ttft
172 t/s
Speed
4Q
qwen3.5-35b-a3b
ReasoningToolsJSON+130.7 intel · $0.140/M · 150ms ttft
153 t/s
Speed
5Q
qwen3-vl-8b-thinking
ReasoningToolsJSON+116.7 intel · $0.117/M · 508ms ttft
139 t/s
Speed
6Q
qwen3-coder-next
ToolsJSON28.3 intel · $0.110/M · 636ms ttft
111 t/s
Speed
7Q
qwen3.6-flash
ReasoningToolsJSON+1$0.188/M · 872ms ttft · 1M ctx
109 t/s
Speed
8Q
qwen3-30b-a3b-thinking-2507
ReasoningToolsJSON22.4 intel · $0.080/M · 374ms ttft
95 t/s
Speed
9Q
qwen3-30b-a3b-instruct-2507
ToolsJSON15.0 intel · $0.048/M · 274ms ttft
91 t/s
Speed
10Q
qwen3-30b-a3b
ReasoningToolsJSON15.3 intel · $0.120/M · 279ms ttft
91 t/s
Speed
11Q
qwen3-235b-a22b-2507
ToolsJSON25.0 intel · $0.090/M · 298ms ttft
84 t/s
Speed
12Q
qwen3.5-397b-a17b
ReasoningToolsJSON+140.1 intel · $0.390/M · 856ms ttft
83 t/s
Speed
13Q
qwen3.5-flash-02-23
ReasoningToolsJSON+1$0.065/M · 642ms ttft · 1M ctx
77 t/s
Speed
14Q
qwen3.6-27b
ReasoningToolsJSON+137.1 intel · $0.288/M · 531ms ttft
76 t/s
Speed
15Q
qwen3-next-80b-a3b-instruct:free
ToolsJSON20.1 intel · Free/M · 583ms ttft
76 t/s
Speed
16Q
qwen3-next-80b-a3b-instruct
ToolsJSON20.1 intel · $0.090/M · 583ms ttft
76 t/s
Speed
17Q
qwen3.5-9b
ReasoningToolsJSON+132.4 intel · $0.100/M · 370ms ttft
75 t/s
Speed
18Q
qwen-2.5-7b-instruct
$0.040/M · 405ms ttft · 131K ctx
73 t/s
Speed
19Q
qwen3-vl-30b-a3b-thinking
ReasoningToolsJSON+119.7 intel · $0.130/M · 480ms ttft
69 t/s
Speed
20Q
qwen3-coder-30b-a3b-instruct
ToolsJSON20.0 intel · $0.070/M · 983ms ttft
69 t/s
Speed
21Q
qwen3-14b
ReasoningToolsJSON16.2 intel · $0.100/M · 349ms ttft
66 t/s
Speed
22Q
qwen-plus-2025-07-28:thinking
ReasoningToolsJSON$0.260/M · 504ms ttft · 1M ctx
63 t/s
Speed
23Q
qwen-plus-2025-07-28
ToolsJSON$0.260/M · 504ms ttft · 1M ctx
63 t/s
Speed
24Q
qwen3-vl-8b-instruct
ToolsJSONVision14.3 intel · $0.080/M · 455ms ttft
60 t/s
Speed
25Q
qwen3-235b-a22b
ReasoningToolsJSON19.8 intel · $0.455/M · 604ms ttft
58 t/s
Speed

Frequently asked

What is the fastest Qwen model?

The fastest Qwen model is Qwen3 32B at 328 output tokens per second. Qwen3.6 35B A3B (172 t/s) and Qwen3 Next 80B A3B Thinking (172 t/s) round out the top three.

What's a good alternative to Qwen3 32B?

Qwen3.6 35B A3B (172 t/s) is the closest alternative on this metric, followed by Qwen3 Next 80B A3B Thinking (172 t/s). See the full ranking above for the tradeoffs.

How many Qwen models are there?

modelgrep tracks 49 Qwen models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Qwen3.7 Max. 25 of them qualify for this ranking.

More Qwen rankings

Qwen: Smartest LLMs Qwen: Best LLMs for Coding Qwen: Best LLMs for Design & Frontend Qwen: Lowest-Latency LLMs Qwen: Cheapest LLMs Qwen: Best Free LLMs Qwen: Best Reasoning LLMs Qwen: Best Vision LLMs Qwen: Best LLMs for Agents Qwen: Best Open-Source LLMs Qwen: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs