modelgrep
D

DeepSeek: R1 Distill Llama 70B

deepseek/deepseek-r1-distill-llama-70b

Reasoning
Use via OpenRouter ↗
Intelligence
Design Elo
Speed
29
241st fastest
Latency
811ms
first token
Input price
$0.800
196th cheapest
Context
128K
8K max out

How it compares

Faster than18%
of all ranked models
Cheaper than34%
of all ranked models

Overview

DeepSeek R1 Distill Llama 70B is a distilled large language model based on [Llama-3.3-70B-Instruct](/meta-llama/llama-3.3-70b-instruct), using outputs from [DeepSeek R1](/deepseek/deepseek-r1). The model combines advanced distillation techniques to achieve high performance across...

Benchmarks

independent · via OpenRouter
Artificial Analysis
Coding Index
11.4
GPQA Diamond
40%
Humanity's Last Exam
6%
SciCode
31%
Tau²-Bench (agentic)
22%

Providers & pricing (1)

ProviderIn $/MOut $/MUptime
Novitabf16$0.800$0.800100%

Specifications

Context window128K
Max output8K
Knowledge cutoffJul 2024
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price
ModeratedNo

R1 Distill Llama 70B FAQ

How much does R1 Distill Llama 70B cost?

R1 Distill Llama 70B costs $0.800 per million input tokens and $0.800 per million output tokens via OpenRouter, making it 196th cheapest of 298 paid models.

How fast is R1 Distill Llama 70B?

R1 Distill Llama 70B generates around 29 tokens per second with 811ms time-to-first-token (p50), the 241st fastest tracked model.

What is R1 Distill Llama 70B's context window?

R1 Distill Llama 70B supports a 128K-token context window and can output up to 8K tokens. It accepts text input.

Compare head-to-head