modelgrep

Lowest-Latency Anthropic Models

Quick answer · Updated June 2026

Claude Haiku 4.5 has the lowest latency of any Anthropic model, responding in about 521ms to first token. Claude 3 Haiku (542ms) and Claude Sonnet 4 (699ms) round out the top three.

521msLatency
31.0Intelligence
82 t/sSpeed
$1.00Input /M
200KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1A
    claude-haiku-4.5
    ReasoningToolsJSON+131.0 intel · $1.00/M · 82 t/s
    521ms
    Latency
  2. 2A
    claude-3-haiku
    ToolsVision12.3 intel · $0.250/M · 68 t/s
    542ms
    Latency
  3. 3A
    claude-sonnet-4
    ReasoningToolsVision33.0 intel · $3.00/M · 49 t/s
    699ms
    Latency
  4. 4A
    claude-opus-4.5
    ReasoningToolsJSON+143.1 intel · $5.00/M · 60 t/s
    777ms
    Latency
  5. 5A
    claude-3.5-haiku
    ToolsVision18.7 intel · $0.800/M · 35 t/s
    832ms
    Latency
  6. 6A
    claude-sonnet-4.5
    ReasoningToolsJSON+137.1 intel · $3.00/M · 46 t/s
    880ms
    Latency
  7. 7A
    claude-sonnet-4.6
    ReasoningToolsJSON+142.6 intel · $3.00/M · 47 t/s
    1.0s
    Latency
  8. 8A
    claude-opus-4.6-fast
    ReasoningToolsJSON+1$30.00/M · 11 t/s · 1M ctx
    1.3s
    Latency
  9. 9A
    claude-opus-4.6
    ReasoningToolsJSON+152.9 intel · $5.00/M · 41 t/s
    1.5s
    Latency
  10. 10A
    claude-opus-4.8-fast
    ReasoningToolsJSON+1$10.00/M · 121 t/s · 1M ctx
    1.6s
    Latency
  11. 11A
    claude-opus-4.7
    ReasoningToolsJSON+157.3 intel · $5.00/M · 63 t/s
    1.6s
    Latency
  12. 12A
    claude-opus-4.8
    ReasoningToolsJSON+161.4 intel · $5.00/M · 59 t/s
    1.8s
    Latency
  13. 13A
    claude-opus-4.1
    ReasoningToolsJSON+1$15.00/M · 27 t/s · 200K ctx
    2.1s
    Latency
  14. 14A
    claude-opus-4
    ReasoningToolsVision$15.00/M · 10 t/s · 200K ctx
    2.3s
    Latency

Frequently asked

Which Anthropic model has the lowest latency?

Claude Haiku 4.5 has the lowest latency of any Anthropic model, responding in about 521ms to first token. Claude 3 Haiku (542ms) and Claude Sonnet 4 (699ms) round out the top three.

What's a good alternative to Claude Haiku 4.5?

Claude 3 Haiku (542ms) is the closest alternative on this metric, followed by Claude Sonnet 4 (699ms). See the full ranking above for the tradeoffs.

How many Anthropic models are there?

modelgrep tracks 16 Anthropic models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Claude Fable 5. 14 of them qualify for this ranking.

More Anthropic rankings

All rankings