modelgrep

Lowest-Latency DeepSeek Models

Quick answer · Updated June 2026

DeepSeek V3.1 has the lowest latency of any DeepSeek model, responding in about 343ms to first token. R1 0528 (570ms) and DeepSeek V4 Flash (574ms) round out the top three.

343msLatency
28.1Intelligence
91 t/sSpeed
$0.210Input /M
164KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1D
    deepseek-chat-v3.1
    ReasoningToolsJSON28.1 intel · $0.210/M · 91 t/s
    343ms
    Latency
  2. 2D
    deepseek-r1-0528
    ReasoningToolsJSON27.1 intel · $0.500/M · 34 t/s
    570ms
    Latency
  3. 3D
    deepseek-v4-flash
    ReasoningToolsJSON46.0 intel · $0.098/M · 79 t/s
    574ms
    Latency
  4. 4D
    deepseek-v4-pro
    ReasoningToolsJSON39.3 intel · $0.435/M · 53 t/s
    596ms
    Latency
  5. 5D
    deepseek-v3.2
    ReasoningToolsJSON41.7 intel · $0.229/M · 33 t/s
    605ms
    Latency
  6. 6D
    deepseek-v3.1-terminus
    ReasoningToolsJSON28.5 intel · $0.270/M · 29 t/s
    811ms
    Latency
  7. 7D
    deepseek-r1-distill-llama-70b
    Reasoning$0.800/M · 36 t/s · 128K ctx
    824ms
    Latency
  8. 8D
    deepseek-chat
    ToolsJSON$0.200/M · 23 t/s · 131K ctx
    840ms
    Latency
  9. 9D
    deepseek-chat-v3-0324
    ToolsJSON22.3 intel · $0.200/M · 26 t/s
    905ms
    Latency
  10. 10D
    deepseek-r1-distill-qwen-32b
    ReasoningJSON$0.290/M · 18 t/s · 128K ctx
    1.2s
    Latency
  11. 11D
    deepseek-v3.2-exp
    ReasoningToolsJSON32.1 intel · $0.270/M · 26 t/s
    1.4s
    Latency
  12. 12D
    deepseek-r1
    ReasoningToolsJSON18.8 intel · $0.700/M · 71 t/s
    1.5s
    Latency

Frequently asked

Which DeepSeek model has the lowest latency?

DeepSeek V3.1 has the lowest latency of any DeepSeek model, responding in about 343ms to first token. R1 0528 (570ms) and DeepSeek V4 Flash (574ms) round out the top three.

What's a good alternative to DeepSeek V3.1?

R1 0528 (570ms) is the closest alternative on this metric, followed by DeepSeek V4 Flash (574ms). See the full ranking above for the tradeoffs.

How many DeepSeek models are there?

modelgrep tracks 12 DeepSeek models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by DeepSeek V4 Flash. 12 of them qualify for this ranking.

More DeepSeek rankings

All rankings