meta-llama/llama-3.2-11b-vision-instruct
Llama 3.2 11B Vision is a multimodal model with 11 billion parameters, designed to handle tasks combining visual and textual data. It excels in tasks such as image captioning and...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DeepInfrafp8 | $0.345 | $0.345 | 131K | 100% |
Llama 3.2 11B Vision Instruct costs $0.345 per million input tokens and $0.345 per million output tokens via OpenRouter, making it 140th cheapest of 298 paid models.
Llama 3.2 11B Vision Instruct scores 8.7 on the Artificial Analysis Intelligence Index, ranking 171st of 178 benchmarked models, with a GPQA Diamond score of 22%.
Llama 3.2 11B Vision Instruct generates around 35 tokens per second with 164ms time-to-first-token (p50), the 222nd fastest tracked model.
Llama 3.2 11B Vision Instruct supports a 131K-token context window and can output up to 16K tokens. It accepts text, image input.