DeepSeek V3.1 has the lowest latency of any DeepSeek model, responding in about 343ms to first token. R1 0528 (570ms) and DeepSeek V4 Flash (574ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
DeepSeek V3.1 has the lowest latency of any DeepSeek model, responding in about 343ms to first token. R1 0528 (570ms) and DeepSeek V4 Flash (574ms) round out the top three.
R1 0528 (570ms) is the closest alternative on this metric, followed by DeepSeek V4 Flash (574ms). See the full ranking above for the tradeoffs.
modelgrep tracks 12 DeepSeek models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by DeepSeek V4 Flash. 12 of them qualify for this ranking.