modelgrep

Small & Fast Z.ai Models

Quick answer · Updated June 2026

The small, fast Z.ai model is GLM 4.5 Air — the efficient tier at 54 tokens/sec and $0.125 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. GLM 4.7 Flash (47 t/s) is next.

54 t/sSpeed
23.2Intelligence
$0.125Input /M
131KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

  1. 1Z
    glm-4.5-air
    ReasoningToolsJSON23.2 intel · $0.125/M · 448ms ttft
    54 t/s
    Speed
  2. 2Z
    glm-4.7-flash
    ReasoningToolsJSON30.1 intel · $0.060/M · 285ms ttft
    47 t/s
    Speed

Frequently asked

What is the smallest, fastest Z.ai model?

The small, fast Z.ai model is GLM 4.5 Air — the efficient tier at 54 tokens/sec and $0.125 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. GLM 4.7 Flash (47 t/s) is next.

What's a good alternative to GLM 4.5 Air?

GLM 4.7 Flash (47 t/s) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many Z.ai models are there?

modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 2 of them qualify for this ranking.

More Z.ai rankings

All rankings