modelgrep

Lowest-Latency Qwen Models

Quick answer · Updated June 2026

Qwen3 235B A22B Instruct 2507 has the lowest latency of any Qwen model, responding in about 167ms to first token. Qwen3 30B A3B Instruct 2507 (242ms) and Qwen3.6 35B A3B (295ms) round out the top three.

167msLatency
25.0Intelligence
78 t/sSpeed
$0.090Input /M
262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1Q
    qwen3-235b-a22b-2507
    ToolsJSON25.0 intel · $0.090/M · 78 t/s
    167ms
    Latency
  2. 2Q
    qwen3-30b-a3b-instruct-2507
    ToolsJSON15.0 intel · $0.048/M · 87 t/s
    242ms
    Latency
  3. 3Q
    qwen3.6-35b-a3b
    ReasoningToolsJSON+131.5 intel · $0.150/M · 137 t/s
    295ms
    Latency
  4. 4Q
    qwen3-32b
    ReasoningToolsJSON$0.080/M · 418 t/s · 131K ctx
    301ms
    Latency
  5. 5Q
    qwen3-30b-a3b
    ReasoningToolsJSON15.3 intel · $0.120/M · 102 t/s
    355ms
    Latency
  6. 6Q
    qwen3-vl-30b-a3b-instruct
    ToolsJSONVision16.0 intel · $0.130/M · 48 t/s
    356ms
    Latency
  7. 7Q
    qwen-2.5-7b-instruct
    $0.040/M · 48 t/s · 131K ctx
    364ms
    Latency
  8. 8Q
    qwen3-235b-a22b-thinking-2507
    ReasoningToolsJSON29.5 intel · $0.100/M · 65 t/s
    373ms
    Latency
  9. 9Q
    qwen3.5-35b-a3b
    ReasoningToolsJSON+130.7 intel · $0.140/M · 165 t/s
    404ms
    Latency
  10. 10Q
    qwen3.5-9b
    ReasoningToolsJSON+132.4 intel · $0.100/M · 95 t/s
    441ms
    Latency
  11. 11Q
    qwen3-vl-8b-instruct
    ToolsJSONVision14.3 intel · $0.080/M · 62 t/s
    441ms
    Latency
  12. 12Q
    qwen3-vl-8b-thinking
    ReasoningToolsJSON+116.7 intel · $0.117/M · 128 t/s
    457ms
    Latency
  13. 13Q
    qwen-2.5-coder-32b-instruct
    $0.660/M · 23 t/s · 128K ctx
    458ms
    Latency
  14. 14Q
    qwen3.5-397b-a17b
    ReasoningToolsJSON+145.0 intel · $0.390/M · 149 t/s
    473ms
    Latency
  15. 15Q
    qwen-plus
    ToolsJSON$0.260/M · 54 t/s · 1M ctx
    475ms
    Latency
  16. 16Q
    qwen3-vl-30b-a3b-thinking
    ReasoningToolsJSON+119.7 intel · $0.130/M · 73 t/s
    486ms
    Latency
  17. 17Q
    qwen3-30b-a3b-thinking-2507
    ReasoningToolsJSON22.4 intel · $0.080/M · 134 t/s
    489ms
    Latency
  18. 18Q
    qwen-plus-2025-07-28:thinking
    ReasoningToolsJSON$0.260/M · 62 t/s · 1M ctx
    505ms
    Latency
  19. 19Q
    qwen-plus-2025-07-28
    ToolsJSON$0.260/M · 62 t/s · 1M ctx
    505ms
    Latency
  20. 20Q
    qwen-2.5-72b-instruct
    ToolsJSON$0.360/M · 25 t/s · 131K ctx
    506ms
    Latency
  21. 21Q
    qwen3.6-27b
    ReasoningToolsJSON+137.1 intel · $0.288/M · 80 t/s
    507ms
    Latency
  22. 22Q
    qwen3-14b
    ReasoningToolsJSON16.2 intel · $0.100/M · 66 t/s
    536ms
    Latency
  23. 23Q
    qwen3-next-80b-a3b-thinking
    ReasoningToolsJSON26.7 intel · $0.098/M · 184 t/s
    537ms
    Latency
  24. 24Q
    qwen3-next-80b-a3b-instruct:free
    ToolsJSON20.1 intel · Free/M · 87 t/s
    590ms
    Latency
  25. 25Q
    qwen3-next-80b-a3b-instruct
    ToolsJSON20.1 intel · $0.090/M · 87 t/s
    590ms
    Latency

Frequently asked

Which Qwen model has the lowest latency?

Qwen3 235B A22B Instruct 2507 has the lowest latency of any Qwen model, responding in about 167ms to first token. Qwen3 30B A3B Instruct 2507 (242ms) and Qwen3.6 35B A3B (295ms) round out the top three.

What's a good alternative to Qwen3 235B A22B Instruct 2507?

Qwen3 30B A3B Instruct 2507 (242ms) is the closest alternative on this metric, followed by Qwen3.6 35B A3B (295ms). See the full ranking above for the tradeoffs.

How many Qwen models are there?

modelgrep tracks 49 Qwen models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Qwen3.7 Max. 25 of them qualify for this ranking.

More Qwen rankings

All rankings