Lowest-Latency Qwen Models

Quick answer · Updated June 2026

Qwen3 235B A22B Instruct 2507 has the lowest latency of any Qwen model, responding in about 167ms to first token. Qwen3 30B A3B Instruct 2507 (242ms) and Qwen3.6 35B A3B (295ms) round out the top three.

167msLatency

25.0Intelligence

78 t/sSpeed

$0.090Input /M

262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

1Q
qwen3-235b-a22b-2507
ToolsJSON25.0 intel · $0.090/M · 78 t/s
167ms
Latency
2Q
qwen3-30b-a3b-instruct-2507
ToolsJSON15.0 intel · $0.048/M · 87 t/s
242ms
Latency
3Q
qwen3.6-35b-a3b
ReasoningToolsJSON+131.5 intel · $0.150/M · 137 t/s
295ms
Latency
4Q
qwen3-32b
ReasoningToolsJSON$0.080/M · 418 t/s · 131K ctx
301ms
Latency
5Q
qwen3-30b-a3b
ReasoningToolsJSON15.3 intel · $0.120/M · 102 t/s
355ms
Latency
6Q
qwen3-vl-30b-a3b-instruct
ToolsJSONVision16.0 intel · $0.130/M · 48 t/s
356ms
Latency
7Q
qwen-2.5-7b-instruct
$0.040/M · 48 t/s · 131K ctx
364ms
Latency
8Q
qwen3-235b-a22b-thinking-2507
ReasoningToolsJSON29.5 intel · $0.100/M · 65 t/s
373ms
Latency
9Q
qwen3.5-35b-a3b
ReasoningToolsJSON+130.7 intel · $0.140/M · 165 t/s
404ms
Latency
10Q
qwen3.5-9b
ReasoningToolsJSON+132.4 intel · $0.100/M · 95 t/s
441ms
Latency
11Q
qwen3-vl-8b-instruct
ToolsJSONVision14.3 intel · $0.080/M · 62 t/s
441ms
Latency
12Q
qwen3-vl-8b-thinking
ReasoningToolsJSON+116.7 intel · $0.117/M · 128 t/s
457ms
Latency
13Q
qwen-2.5-coder-32b-instruct
$0.660/M · 23 t/s · 128K ctx
458ms
Latency
14Q
qwen3.5-397b-a17b
ReasoningToolsJSON+145.0 intel · $0.390/M · 149 t/s
473ms
Latency
15Q
qwen-plus
ToolsJSON$0.260/M · 54 t/s · 1M ctx
475ms
Latency
16Q
qwen3-vl-30b-a3b-thinking
ReasoningToolsJSON+119.7 intel · $0.130/M · 73 t/s
486ms
Latency
17Q
qwen3-30b-a3b-thinking-2507
ReasoningToolsJSON22.4 intel · $0.080/M · 134 t/s
489ms
Latency
18Q
qwen-plus-2025-07-28:thinking
ReasoningToolsJSON$0.260/M · 62 t/s · 1M ctx
505ms
Latency
19Q
qwen-plus-2025-07-28
ToolsJSON$0.260/M · 62 t/s · 1M ctx
505ms
Latency
20Q
qwen-2.5-72b-instruct
ToolsJSON$0.360/M · 25 t/s · 131K ctx
506ms
Latency
21Q
qwen3.6-27b
ReasoningToolsJSON+137.1 intel · $0.288/M · 80 t/s
507ms
Latency
22Q
qwen3-14b
ReasoningToolsJSON16.2 intel · $0.100/M · 66 t/s
536ms
Latency
23Q
qwen3-next-80b-a3b-thinking
ReasoningToolsJSON26.7 intel · $0.098/M · 184 t/s
537ms
Latency
24Q
qwen3-next-80b-a3b-instruct:free
ToolsJSON20.1 intel · Free/M · 87 t/s
590ms
Latency
25Q
qwen3-next-80b-a3b-instruct
ToolsJSON20.1 intel · $0.090/M · 87 t/s
590ms
Latency

Frequently asked

Which Qwen model has the lowest latency?

What's a good alternative to Qwen3 235B A22B Instruct 2507?

Qwen3 30B A3B Instruct 2507 (242ms) is the closest alternative on this metric, followed by Qwen3.6 35B A3B (295ms). See the full ranking above for the tradeoffs.

How many Qwen models are there?

modelgrep tracks 49 Qwen models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Qwen3.7 Max. 25 of them qualify for this ranking.

More Qwen rankings

Qwen: Smartest LLMs Qwen: Best LLMs for Coding Qwen: Best LLMs for Design & Frontend Qwen: Fastest LLMs Qwen: Cheapest LLMs Qwen: Best Free LLMs Qwen: Best Reasoning LLMs Qwen: Best Vision LLMs Qwen: Best LLMs for Agents Qwen: Best Open-Source LLMs Qwen: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs