Meta: Llama 3.2 11B Vision Instruct

meta-llama/llama-3.2-11b-vision-instruct

171st smartest of 178JSONVision

Use via OpenRouter ↗

Intelligence

8.7

171st of 178

Design Elo

—

Speed

222nd fastest

Latency

164ms

first token

Input price

$0.345

140th cheapest

Context

131K

16K max out

How it compares

Smarter than4%

of all ranked models

Faster than25%

of all ranked models

Cheaper than53%

of all ranked models

Overview

Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...

Benchmarks

independent · via OpenRouter

Artificial Analysis6th percentile

Intelligence Index

8.7

Coding Index

4.2

Agentic Index

4.9

GPQA Diamond

22%

Humanity's Last Exam

SciCode

11%

Tau²-Bench (agentic)

15%

Providers & pricing (1)

Provider	In $/M	Out $/M	Context	Uptime
DeepInfrafp8	$0.345	$0.345	131K	100%

Specifications

Context window131K

Max output16K

Knowledge cutoffDec 2023

Input modalitiestext, image

Output modalitiestext

Prompt caching—

Cache read price—

ModeratedNo

Open weightsmeta-llama/Llama-3.2-11B-Vision-Instruct ↗

Llama 3.2 11B Vision Instruct FAQ

How much does Llama 3.2 11B Vision Instruct cost?

Llama 3.2 11B Vision Instruct costs $0.345 per million input tokens and $0.345 per million output tokens via OpenRouter, making it 140th cheapest of 298 paid models.

How smart is Llama 3.2 11B Vision Instruct?

Llama 3.2 11B Vision Instruct scores 8.7 on the Artificial Analysis Intelligence Index, ranking 171st of 178 benchmarked models, with a GPQA Diamond score of 22%.