Kimi K2.5 has the lowest latency of any MoonshotAI model, responding in about 211ms to first token. Kimi K2 0905 (220ms) and Kimi K2.7 Code (378ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Kimi K2.5 has the lowest latency of any MoonshotAI model, responding in about 211ms to first token. Kimi K2 0905 (220ms) and Kimi K2.7 Code (378ms) round out the top three.
Kimi K2 0905 (220ms) is the closest alternative on this metric, followed by Kimi K2.7 Code (378ms). See the full ranking above for the tradeoffs.
modelgrep tracks 6 MoonshotAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Kimi K2.6. 6 of them qualify for this ranking.