Lowest-Latency Z.ai Models

Quick answer · Updated June 2026

GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.

308msLatency

30.1Intelligence

35 t/sSpeed

$0.060Input /M

203KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

Frequently asked

Which Z.ai model has the lowest latency?

GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.

What's a good alternative to GLM 4.7 Flash?

GLM 5 (310ms) is the closest alternative on this metric, followed by GLM 4.5 Air (378ms). See the full ranking above for the tradeoffs.

How many Z.ai models are there?

modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 10 of them qualify for this ranking.

More Z.ai rankings

Z.ai: Smartest LLMs Z.ai: Best LLMs for Coding Z.ai: Best LLMs for Design & Frontend Z.ai: Fastest LLMs Z.ai: Cheapest LLMs Z.ai: Best Free LLMs Z.ai: Best Reasoning LLMs Z.ai: Best Vision LLMs Z.ai: Best LLMs for Agents Z.ai: Best Open-Source LLMs Z.ai: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs