GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.
GLM 5 (310ms) is the closest alternative on this metric, followed by GLM 4.5 Air (378ms). See the full ranking above for the tradeoffs.
modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 10 of them qualify for this ranking.