Small & Fast Z.ai Models

Quick answer · Updated June 2026

The small, fast Z.ai model is GLM 4.5 Air — the efficient tier at 54 tokens/sec and $0.125 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. GLM 4.7 Flash (47 t/s) is next.

54 t/sSpeed

23.2Intelligence

$0.125Input /M

131KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

Frequently asked

What is the smallest, fastest Z.ai model?

What's a good alternative to GLM 4.5 Air?

GLM 4.7 Flash (47 t/s) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many Z.ai models are there?

modelgrep tracks 10 Z.ai models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GLM 5 Turbo. 2 of them qualify for this ranking.

More Z.ai rankings

Z.ai: Smartest LLMs Z.ai: Best LLMs for Coding Z.ai: Best LLMs for Design & Frontend Z.ai: Fastest LLMs Z.ai: Lowest-Latency LLMs Z.ai: Cheapest LLMs Z.ai: Best Free LLMs Z.ai: Best Reasoning LLMs Z.ai: Best Vision LLMs Z.ai: Best LLMs for Agents Z.ai: Best Open-Source LLMs Z.ai: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs