Qwen3 235B A22B Instruct 2507 has the lowest latency of any Qwen model, responding in about 167ms to first token. Qwen3 30B A3B Instruct 2507 (242ms) and Qwen3.6 35B A3B (295ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Qwen3 235B A22B Instruct 2507 has the lowest latency of any Qwen model, responding in about 167ms to first token. Qwen3 30B A3B Instruct 2507 (242ms) and Qwen3.6 35B A3B (295ms) round out the top three.
Qwen3 30B A3B Instruct 2507 (242ms) is the closest alternative on this metric, followed by Qwen3.6 35B A3B (295ms). See the full ranking above for the tradeoffs.
modelgrep tracks 49 Qwen models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Qwen3.7 Max. 25 of them qualify for this ranking.