modelgrep

Lowest-Latency Z.ai Models

Quick answer · Updated June 2026

GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.

308msLatency
30.1Intelligence
35 t/sSpeed
$0.060Input /M
203KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1Z
    glm-4.7-flash
    ReasoningToolsJSON30.1 intel · $0.060/M · 35 t/s
    308ms
    Latency
  2. 2Z
    glm-5
    ReasoningToolsJSON40.6 intel · $0.600/M · 108 t/s
    310ms
    Latency
  3. 3Z
    glm-4.5-air
    ReasoningToolsJSON23.2 intel · $0.125/M · 59 t/s
    378ms
    Latency
  4. 4Z
    glm-5.1
    ReasoningToolsJSON43.8 intel · $0.980/M · 84 t/s
    475ms
    Latency
  5. 5Z
    glm-4.6
    ReasoningToolsJSON30.2 intel · $0.430/M · 37 t/s
    548ms
    Latency
  6. 6Z
    glm-4.7
    ReasoningToolsJSON42.1 intel · $0.400/M · 452 t/s
    657ms
    Latency
  7. 7Z
    glm-4.5
    ReasoningToolsJSON26.4 intel · $0.600/M · 42 t/s
    1.4s
    Latency
  8. 8Z
    glm-4.6v
    ReasoningToolsJSON+123.4 intel · $0.300/M · 60 t/s
    1.5s
    Latency
  9. 9Z
    glm-4.5v
    ReasoningToolsJSON+115.1 intel · $0.600/M · 19 t/s
    1.5s
    Latency
  10. 10Z
    glm-5-turbo
    ReasoningToolsJSON46.8 intel · $1.20/M · 29 t/s
    1.9s
    Latency

Frequently asked

Which Z.ai model has the lowest latency?

GLM 4.7 Flash has the lowest latency of any Z.ai model, responding in about 308ms to first token. GLM 5 (310ms) and GLM 4.5 Air (378ms) round out the top three.

What's a good alternative to GLM 4.7 Flash?

GLM 5 (310ms) is the closest alternative on this metric, followed by GLM 4.5 Air (378ms). See the full ranking above for the tradeoffs.

How many Z.ai models are there?

modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 10 of them qualify for this ranking.

More Z.ai rankings

All rankings