modelgrep

Fastest LLMs

Quick answer · Updated June 2026

The fastest LLM is Morph V3 Fast at 3.1k output tokens per second. Morph V3 Large (3.0k t/s) and gpt-oss-safeguard-20b (530 t/s) round out the top three.

3.1k t/sSpeed
$0.800Input /M
82KContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

  1. 1M
    morph-v3-fast
    $0.800/M · 495ms ttft · 82K ctx
    3.1k t/s
    Speed
  2. 2M
    morph-v3-large
    $0.900/M · 435ms ttft · 262K ctx
    3.0k t/s
    Speed
  3. 3O
    gpt-oss-safeguard-20b
    ReasoningToolsJSON$0.075/M · 235ms ttft · 131K ctx
    530 t/s
    Speed
  4. 4Z
    glm-4.7
    ReasoningToolsJSON42.1 intel · $0.400/M · 564ms ttft
    474 t/s
    Speed
  5. 5O
    gpt-oss-120b:free
    ReasoningTools24.5 intel · Free/M · 181ms ttft
    450 t/s
    Speed
  6. 6O
    gpt-oss-120b
    ReasoningToolsJSON24.5 intel · $0.039/M · 181ms ttft
    450 t/s
    Speed
  7. 7X
    grok-4.20-multi-agent
    ReasoningJSONVision$2.00/M · 21.7s ttft · 2M ctx
    355 t/s
    Speed
  8. 8O
    gpt-oss-20b:free
    ReasoningToolsJSON24.5 intel · Free/M · 235ms ttft
    348 t/s
    Speed
  9. 9O
    gpt-oss-20b
    ReasoningToolsJSON24.5 intel · $0.029/M · 235ms ttft
    348 t/s
    Speed
  10. 10M
    minimax-m2.7
    ReasoningToolsJSON49.6 intel · $0.250/M · 673ms ttft
    334 t/s
    Speed
  11. 11I
    mercury-2
    ReasoningToolsJSON32.8 intel · $0.250/M · 303ms ttft
    328 t/s
    Speed
  12. 12Q
    qwen3-32b
    ReasoningToolsJSON16.5 intel · $0.080/M · 292ms ttft
    303 t/s
    Speed
  13. 13M
    phi-4-mini-instruct
    JSON8.4 intel · $0.080/M · 140ms ttft
    277 t/s
    Speed
  14. 14N
    nemotron-3-super-120b-a12b:free
    ReasoningToolsJSON36.0 intel · Free/M · 1.2s ttft
    240 t/s
    Speed
  15. 15N
    nemotron-3-super-120b-a12b
    ReasoningToolsJSON36.0 intel · $0.090/M · 1.2s ttft
    240 t/s
    Speed
  16. 16G
    gemini-2.5-flash-lite-preview-09-2025
    ReasoningToolsJSON+219.4 intel · $0.100/M · 397ms ttft
    202 t/s
    Speed
  17. 17M
    minimax-m2.5
    ReasoningToolsJSON41.9 intel · $0.150/M · 521ms ttft
    199 t/s
    Speed
  18. 18N
    nemotron-3-nano-omni-30b-a3b-reasoning:free
    ReasoningToolsVision+121.4 intel · Free/M · 436ms ttft
    194 t/s
    Speed
  19. 19G
    gemini-2.5-flash-image
    JSONVisionImage out$0.300/M · 1.7s ttft · 33K ctx
    189 t/s
    Speed
  20. 20Q
    qwen3.5-35b-a3b
    ReasoningToolsJSON+130.7 intel · $0.140/M · 400ms ttft
    174 t/s
    Speed
  21. 21Q
    qwen3.6-35b-a3b
    ReasoningToolsJSON+131.5 intel · $0.150/M · 310ms ttft
    173 t/s
    Speed
  22. 22A
    trinity-large-thinking
    ReasoningToolsJSON31.9 intel · $0.220/M · 542ms ttft
    168 t/s
    Speed
  23. 23Q
    qwen3-next-80b-a3b-thinking
    ReasoningToolsJSON26.7 intel · $0.098/M · 352ms ttft
    168 t/s
    Speed
  24. 24G
    gemini-3.5-flash
    ReasoningToolsJSON+243.3 intel · $1.50/M · 1.7s ttft
    164 t/s
    Speed
  25. 25M
    kimi-k2-0905
    ToolsJSON30.9 intel · $0.600/M · 258ms ttft
    162 t/s
    Speed

Frequently asked

What is the fastest LLM?

The fastest LLM is Morph V3 Fast at 3.1k output tokens per second. Morph V3 Large (3.0k t/s) and gpt-oss-safeguard-20b (530 t/s) round out the top three.

What's a good alternative to Morph V3 Fast?

Morph V3 Large (3.0k t/s) is the closest alternative on this metric, followed by gpt-oss-safeguard-20b (530 t/s). See the full ranking above for the tradeoffs.

By maker

All rankings