modelgrep
M

Meta: Llama 3.1 8B Instruct

meta-llama/llama-3.1-8b-instruct

161st smartest of 178Cheaper than 99% of paidToolsJSON
Use via OpenRouter ↗
Intelligence
11.8
161st of 178
Design Elo
Speed
145
31st fastest
Latency
143ms
first token
Input price
$0.020
3rd cheapest
Context
131K
16K max out

How it compares

Smarter than10%
of all ranked models
Faster than89%
of all ranked models
Cheaper than99%
of all ranked models

Overview

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 8B instruct-tuned version is fast and efficient. It has demonstrated strong performance compared to...

Benchmarks

independent · via OpenRouter
Artificial Analysis10th percentile
Intelligence Index
11.8
Coding Index
4.9
Agentic Index
5.5
GPQA Diamond
26%
Humanity's Last Exam
5%
SciCode
13%
Tau²-Bench (agentic)
16%

Providers & pricing (6)

ProviderIn $/MOut $/MUptime
DeepInfrafp8$0.020$0.03099.8%
Novitafp8$0.020$0.05099.9%
DeepInfrabf16$0.020$0.050100%
Groq$0.050$0.08099.9%
Cloudflarefp8$0.152$0.28799.9%
WandBbf16$0.220$0.220100%

Specifications

Context window131K
Max output16K
Knowledge cutoffDec 2023
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Llama 3.1 8B Instruct FAQ

How much does Llama 3.1 8B Instruct cost?

Llama 3.1 8B Instruct costs $0.020 per million input tokens and $0.030 per million output tokens via OpenRouter, making it 3rd cheapest of 298 paid models.

How smart is Llama 3.1 8B Instruct?

Llama 3.1 8B Instruct scores 11.8 on the Artificial Analysis Intelligence Index, ranking 161st of 178 benchmarked models, with a GPQA Diamond score of 26%.

How fast is Llama 3.1 8B Instruct?

Llama 3.1 8B Instruct generates around 145 tokens per second with 143ms time-to-first-token (p50), the 31st fastest tracked model.

What is Llama 3.1 8B Instruct's context window?

Llama 3.1 8B Instruct supports a 131K-token context window and can output up to 16K tokens. It accepts text input.

Compare head-to-head