modelgrep
M

Meta: Llama 3.1 70B Instruct

meta-llama/llama-3.1-70b-instruct

157th smartest of 178ToolsJSON
Use via OpenRouter ↗
Intelligence
12.5
157th of 178
Design Elo
Speed
28
245th fastest
Latency
303ms
first token
Input price
$0.400
153rd cheapest
Context
131K
16K max out

How it compares

Smarter than12%
of all ranked models
Faster than17%
of all ranked models
Cheaper than49%
of all ranked models

Overview

Meta's latest class of model (Llama 3.1) launched with a variety of sizes & flavors. This 70B instruct-tuned version is optimized for high quality dialogue usecases. It has demonstrated strong...

Benchmarks

independent · via OpenRouter
Artificial Analysis12th percentile
Intelligence Index
12.5
Coding Index
10.9
Agentic Index
5.1
GPQA Diamond
41%
Humanity's Last Exam
5%
SciCode
27%
Tau²-Bench (agentic)
15%

Providers & pricing (3)

ProviderIn $/MOut $/MUptime
DeepInfrafp8$0.400$0.40099.6%
Amazon Bedrock$0.720$0.72099.7%
WandBbf16$0.800$0.800100%

Specifications

Context window131K
Max output16K
Knowledge cutoffDec 2023
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

Llama 3.1 70B Instruct FAQ

How much does Llama 3.1 70B Instruct cost?

Llama 3.1 70B Instruct costs $0.400 per million input tokens and $0.400 per million output tokens via OpenRouter, making it 153rd cheapest of 298 paid models.

How smart is Llama 3.1 70B Instruct?

Llama 3.1 70B Instruct scores 12.5 on the Artificial Analysis Intelligence Index, ranking 157th of 178 benchmarked models, with a GPQA Diamond score of 41%.

How fast is Llama 3.1 70B Instruct?

Llama 3.1 70B Instruct generates around 28 tokens per second with 303ms time-to-first-token (p50), the 245th fastest tracked model.

What is Llama 3.1 70B Instruct's context window?

Llama 3.1 70B Instruct supports a 131K-token context window and can output up to 16K tokens. It accepts text input.

Compare head-to-head