modelgrep
G

Google: Gemma 3 4B

google/gemma-3-4b-it

177th smartest of 178Cheaper than 94% of paidJSONVision
Use via OpenRouter ↗
Intelligence
6.3
177th of 178
Design Elo
Speed
20
263rd fastest
Latency
566ms
first token
Input price
$0.050
19th cheapest
Context
131K
16K max out

How it compares

Smarter than1%
of all ranked models
Faster than11%
of all ranked models
Cheaper than94%
of all ranked models

Overview

Gemma 3 introduces multimodality, supporting vision-language input and text outputs. It handles context windows up to 128k tokens, understands over 140 languages, and offers improved math, reasoning, and chat capabilities,...

Benchmarks

independent · via OpenRouter
Artificial Analysis1th percentile
Intelligence Index
6.3
Coding Index
2.9
Agentic Index
1.7
GPQA Diamond
29%
Humanity's Last Exam
5%
SciCode
7%
Tau²-Bench (agentic)
5%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
DeepInfrabf16$0.050$0.100100%

Specifications

Context window131K
Max output16K
Knowledge cutoffAug 2024
Input modalitiestext, image
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Gemma 3 4B FAQ

How much does Gemma 3 4B cost?

Gemma 3 4B costs $0.050 per million input tokens and $0.100 per million output tokens via OpenRouter, making it 19th cheapest of 298 paid models.

How smart is Gemma 3 4B?

Gemma 3 4B scores 6.3 on the Artificial Analysis Intelligence Index, ranking 177th of 178 benchmarked models, with a GPQA Diamond score of 29%.

How fast is Gemma 3 4B?

Gemma 3 4B generates around 20 tokens per second with 566ms time-to-first-token (p50), the 263rd fastest tracked model.

What is Gemma 3 4B's context window?

Gemma 3 4B supports a 131K-token context window and can output up to 16K tokens. It accepts text, image input.

Compare head-to-head