MiMo-V2.5-Pro has the lowest latency of any Xiaomi model, responding in about 198ms to first token. MiMo-V2-Flash (536ms) and MiMo-V2.5 (2.4s) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
MiMo-V2.5-Pro has the lowest latency of any Xiaomi model, responding in about 198ms to first token. MiMo-V2-Flash (536ms) and MiMo-V2.5 (2.4s) round out the top three.
MiMo-V2-Flash (536ms) is the closest alternative on this metric, followed by MiMo-V2.5 (2.4s). See the full ranking above for the tradeoffs.
modelgrep tracks 3 Xiaomi models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by MiMo-V2.5-Pro. 3 of them qualify for this ranking.