The fastest Z.ai model is GLM 4.7 at 452 output tokens per second. GLM 5 (108 t/s) and GLM 5.1 (84 t/s) round out the top three.
AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.
The fastest Z.ai model is GLM 4.7 at 452 output tokens per second. GLM 5 (108 t/s) and GLM 5.1 (84 t/s) round out the top three.
GLM 5 (108 t/s) is the closest alternative on this metric, followed by GLM 5.1 (84 t/s). See the full ranking above for the tradeoffs.
modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 10 of them qualify for this ranking.