Small & Fast DeepSeek Models

Quick answer · Updated June 2026

The small, fast DeepSeek model is DeepSeek V3.1 — the efficient tier at 92 tokens/sec and $0.210 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. DeepSeek V4 Flash (73 t/s) is next.

92 t/sSpeed

28.1Intelligence

$0.210Input /M

164KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

Frequently asked

What is the smallest, fastest DeepSeek model?

What's a good alternative to DeepSeek V3.1?

DeepSeek V4 Flash (73 t/s) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many DeepSeek models are there?

modelgrep tracks 12 DeepSeek models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by DeepSeek V4 Flash. 2 of them qualify for this ranking.

More DeepSeek rankings

DeepSeek: Smartest LLMs DeepSeek: Best LLMs for Coding DeepSeek: Best LLMs for Design & Frontend DeepSeek: Fastest LLMs DeepSeek: Lowest-Latency LLMs DeepSeek: Cheapest LLMs DeepSeek: Best Free LLMs DeepSeek: Best Reasoning LLMs DeepSeek: Best Vision LLMs DeepSeek: Best LLMs for Agents DeepSeek: Best Open-Source LLMs DeepSeek: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs