modelgrep

Lowest-Latency MoonshotAI Models

Quick answer · Updated June 2026

Kimi K2.5 has the lowest latency of any MoonshotAI model, responding in about 211ms to first token. Kimi K2 0905 (220ms) and Kimi K2.7 Code (378ms) round out the top three.

211msLatency
37.3Intelligence
89 t/sSpeed
$0.375Input /M
262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1M
    kimi-k2.5
    ReasoningToolsJSON+137.3 intel · $0.375/M · 89 t/s
    211ms
    Latency
  2. 2M
    kimi-k2-0905
    ToolsJSON30.9 intel · $0.600/M · 139 t/s
    220ms
    Latency
  3. 3M
    kimi-k2.7-code
    ReasoningToolsJSON+1$0.750/M · 63 t/s · 262K ctx
    378ms
    Latency
  4. 4M
    kimi-k2-thinking
    ReasoningToolsJSON24.1 intel · $0.600/M · 103 t/s
    412ms
    Latency
  5. 5M
    kimi-k2.6
    ReasoningToolsJSON+142.9 intel · $0.680/M · 162 t/s
    458ms
    Latency
  6. 6M
    kimi-k2
    Tools14.4 intel · $0.570/M · 15 t/s
    1.6s
    Latency

Frequently asked

Which MoonshotAI model has the lowest latency?

Kimi K2.5 has the lowest latency of any MoonshotAI model, responding in about 211ms to first token. Kimi K2 0905 (220ms) and Kimi K2.7 Code (378ms) round out the top three.

What's a good alternative to Kimi K2.5?

Kimi K2 0905 (220ms) is the closest alternative on this metric, followed by Kimi K2.7 Code (378ms). See the full ranking above for the tradeoffs.

How many MoonshotAI models are there?

modelgrep tracks 6 MoonshotAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Kimi K2.6. 6 of them qualify for this ranking.

More MoonshotAI rankings

All rankings