modelgrep
M

Meta: Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

171st smartest of 178JSONVision
Use via OpenRouter ↗
Intelligence
8.7
171st of 178
Design Elo
Speed
35
222nd fastest
Latency
164ms
first token
Input price
$0.345
140th cheapest
Context
131K
16K max out

How it compares

Smarter than4%
of all ranked models
Faster than25%
of all ranked models
Cheaper than53%
of all ranked models

Overview

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Benchmarks

independent · via OpenRouter
Artificial Analysis6th percentile
Intelligence Index
8.7
Coding Index
4.2
Agentic Index
4.9
GPQA Diamond
22%
Humanity's Last Exam
5%
SciCode
11%
Tau²-Bench (agentic)
15%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
DeepInfrafp8$0.345$0.345100%

Specifications

Context window131K
Max output16K
Knowledge cutoffDec 2023
Input modalitiestext, image
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Llama 3.2 11B Vision Instruct FAQ

How much does Llama 3.2 11B Vision Instruct cost?

Llama 3.2 11B Vision Instruct costs $0.345 per million input tokens and $0.345 per million output tokens via OpenRouter, making it 140th cheapest of 298 paid models.

How smart is Llama 3.2 11B Vision Instruct?

Llama 3.2 11B Vision Instruct scores 8.7 on the Artificial Analysis Intelligence Index, ranking 171st of 178 benchmarked models, with a GPQA Diamond score of 22%.

How fast is Llama 3.2 11B Vision Instruct?

Llama 3.2 11B Vision Instruct generates around 35 tokens per second with 164ms time-to-first-token (p50), the 222nd fastest tracked model.

What is Llama 3.2 11B Vision Instruct's context window?

Llama 3.2 11B Vision Instruct supports a 131K-token context window and can output up to 16K tokens. It accepts text, image input.

Compare head-to-head