google/gemma-4-26b-a4b-it
Gemma 4 26B A4B IT is an instruction-tuned Mixture-of-Experts (MoE) model from Google DeepMind. Despite 25.2B total parameters, only 3.8B activate per token during inference — delivering near-31B quality at...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DekaLLMbf16 | $0.060 | $0.330 | 262K | 98% |
| DeepInfrafp8 | $0.070 | $0.340 | 262K | 98.7% |
| Cloudflare | $0.100 | $0.300 | 256K | 99.2% |
| SiliconFlowfp8 | $0.120 | $0.400 | 262K | 95.7% |
| Parasailbf16 | $0.130 | $0.400 | 262K | 98.6% |
| Novitabf16 | $0.130 | $0.400 | 262K | 50.4% |
| NextBitbf16 | $0.130 | $0.400 | 262K | 98.4% |
| $0.150 | $0.600 | 262K | 97.4% | |
| Venicebf16 | $0.163 | $0.500 | 256K | 97.5% |
Gemma 4 26B A4B costs $0.060 per million input tokens and $0.330 per million output tokens via OpenRouter, making it 23rd cheapest of 298 paid models.
Gemma 4 26B A4B scores 31.2 on the Artificial Analysis Intelligence Index, ranking 75th of 178 benchmarked models, with a GPQA Diamond score of 79%.
Gemma 4 26B A4B generates around 68 tokens per second with 356ms time-to-first-token (p50), the 115th fastest tracked model.
Gemma 4 26B A4B supports a 262K-token context window. It accepts image, text, video input.