The fastest DeepSeek model is DeepSeek V3.1 at 92 output tokens per second. DeepSeek V4 Flash (73 t/s) and R1 (73 t/s) round out the top three.
AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.
The fastest DeepSeek model is DeepSeek V3.1 at 92 output tokens per second. DeepSeek V4 Flash (73 t/s) and R1 (73 t/s) round out the top three.
DeepSeek V4 Flash (73 t/s) is the closest alternative on this metric, followed by R1 (73 t/s). See the full ranking above for the tradeoffs.
modelgrep tracks 12 DeepSeek models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by DeepSeek V4 Flash. 12 of them qualify for this ranking.