modelgrep

Small & Fast DeepSeek Models

Quick answer · Updated June 2026

The small, fast DeepSeek model is DeepSeek V3.1 — the efficient tier at 92 tokens/sec and $0.210 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. DeepSeek V4 Flash (73 t/s) is next.

92 t/sSpeed
28.1Intelligence
$0.210Input /M
164KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

  1. 1D
    deepseek-chat-v3.1
    ReasoningToolsJSON28.1 intel · $0.210/M · 330ms ttft
    92 t/s
    Speed
  2. 2D
    deepseek-v4-flash
    ReasoningToolsJSON46.0 intel · $0.090/M · 537ms ttft
    73 t/s
    Speed

Frequently asked

What is the smallest, fastest DeepSeek model?

The small, fast DeepSeek model is DeepSeek V3.1 — the efficient tier at 92 tokens/sec and $0.210 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. DeepSeek V4 Flash (73 t/s) is next.

What's a good alternative to DeepSeek V3.1?

DeepSeek V4 Flash (73 t/s) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many DeepSeek models are there?

modelgrep tracks 12 DeepSeek models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by DeepSeek V4 Flash. 2 of them qualify for this ranking.

More DeepSeek rankings

All rankings