google/gemma-4-31b-it
Gemma 4 31B Instruct is Google DeepMind's 30.7B dense multimodal model supporting text and image input with text output. Features a 256K token context window, configurable thinking/reasoning mode, native function...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| WandBbf16 | $0.120 | $0.350 | 262K | 99.9% |
| Venicebf16 | $0.120 | $0.360 | 256K | 98% |
| DeepInfrafp4 | $0.120 | $0.370 | 262K | 97.5% |
| DeepInfrafp8 | $0.130 | $0.380 | 262K | 97.4% |
| SiliconFlowfp8 | $0.130 | $0.400 | 262K | 88% |
| Novitabf16 | $0.140 | $0.400 | 262K | 99.4% |
| Parasailfp8 | $0.150 | $0.400 | 262K | 93.9% |
| Chutesfp4 | $0.150 | $0.420 | 131K | 97.3% |
| Phala | $0.150 | $0.460 | 262K | 90.9% |
| Together | $0.280 | $0.860 | 262K | 92.5% |
| Together | $0.390 | $0.970 | 262K | 96.7% |
Gemma 4 31B costs $0.120 per million input tokens and $0.350 per million output tokens via OpenRouter, making it 66th cheapest of 298 paid models.
Gemma 4 31B scores 39.2 on the Artificial Analysis Intelligence Index, ranking 48th of 178 benchmarked models, with a GPQA Diamond score of 86%.
Gemma 4 31B generates around 55 tokens per second with 309ms time-to-first-token (p50), the 150th fastest tracked model.
Gemma 4 31B supports a 262K-token context window and can output up to 262K tokens. It accepts image, text, video input.