MiniMax M2 has the lowest latency of any MiniMax model, responding in about 340ms to first token. MiniMax M2.7 (465ms) and MiniMax M2.5 (532ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
MiniMax M2 has the lowest latency of any MiniMax model, responding in about 340ms to first token. MiniMax M2.7 (465ms) and MiniMax M2.5 (532ms) round out the top three.
MiniMax M2.7 (465ms) is the closest alternative on this metric, followed by MiniMax M2.5 (532ms). See the full ranking above for the tradeoffs.
modelgrep tracks 8 MiniMax models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by MiniMax M3. 8 of them qualify for this ranking.