z-ai/glm-4.7-flash
As a 30B-class SOTA model, GLM-4.7-Flash offers a new option that balances performance and efficiency. It is further optimized for agentic coding use cases, strengthening coding capabilities, long-horizon task planning,...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DeepInfrabf16 | $0.060 | $0.400 | 203K | 95.9% |
| Cloudflare | $0.060 | $0.400 | 131K | 99.7% |
| Novitabf16 | $0.070 | $0.400 | 200K | 71.2% |
| Phala | $0.100 | $0.430 | 203K | 78.6% |
| Venicefp8 | $0.125 | $0.500 | 128K | — |
GLM 4.7 Flash costs $0.060 per million input tokens and $0.400 per million output tokens via OpenRouter, making it 24th cheapest of 298 paid models.
GLM 4.7 Flash scores 30.1 on the Artificial Analysis Intelligence Index, ranking 81st of 178 benchmarked models, with a GPQA Diamond score of 58%.
GLM 4.7 Flash generates around 37 tokens per second with 333ms time-to-first-token (p50), the 210th fastest tracked model.
GLM 4.7 Flash supports a 203K-token context window and can output up to 16K tokens. It accepts text input.