modelgrep

Fastest Z.ai Models

Quick answer · Updated June 2026

The fastest Z.ai model is GLM 4.7 at 452 output tokens per second. GLM 5 (108 t/s) and GLM 5.1 (84 t/s) round out the top three.

452 t/sSpeed
42.1Intelligence
$0.400Input /M
203KContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

  1. 1Z
    glm-4.7
    ReasoningToolsJSON42.1 intel · $0.400/M · 657ms ttft
    452 t/s
    Speed
  2. 2Z
    glm-5
    ReasoningToolsJSON40.6 intel · $0.600/M · 310ms ttft
    108 t/s
    Speed
  3. 3Z
    glm-5.1
    ReasoningToolsJSON43.8 intel · $0.980/M · 475ms ttft
    84 t/s
    Speed
  4. 4Z
    glm-4.6v
    ReasoningToolsJSON+123.4 intel · $0.300/M · 1.5s ttft
    60 t/s
    Speed
  5. 5Z
    glm-4.5-air
    ReasoningToolsJSON23.2 intel · $0.125/M · 378ms ttft
    59 t/s
    Speed
  6. 6Z
    glm-4.5
    ReasoningToolsJSON26.4 intel · $0.600/M · 1.4s ttft
    42 t/s
    Speed
  7. 7Z
    glm-4.6
    ReasoningToolsJSON30.2 intel · $0.430/M · 548ms ttft
    37 t/s
    Speed
  8. 8Z
    glm-4.7-flash
    ReasoningToolsJSON30.1 intel · $0.060/M · 308ms ttft
    35 t/s
    Speed
  9. 9Z
    glm-5-turbo
    ReasoningToolsJSON46.8 intel · $1.20/M · 1.9s ttft
    29 t/s
    Speed
  10. 10Z
    glm-4.5v
    ReasoningToolsJSON+115.1 intel · $0.600/M · 1.5s ttft
    19 t/s
    Speed

Frequently asked

What is the fastest Z.ai model?

The fastest Z.ai model is GLM 4.7 at 452 output tokens per second. GLM 5 (108 t/s) and GLM 5.1 (84 t/s) round out the top three.

What's a good alternative to GLM 4.7?

GLM 5 (108 t/s) is the closest alternative on this metric, followed by GLM 5.1 (84 t/s). See the full ranking above for the tradeoffs.

How many Z.ai models are there?

modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 10 of them qualify for this ranking.

More Z.ai rankings

All rankings