meta-llama/llama-3.3-70b-instruct
The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DeepInfrafp8 | $0.100 | $0.320 | 131K | 95.5% |
| Nebiusfp8 | $0.130 | $0.400 | 131K | 97% |
| AkashMLfp8 | $0.130 | $0.400 | 131K | 99.1% |
| Novitabf16 | $0.135 | $0.400 | 131K | 99.9% |
| Parasailfp8 | $0.220 | $0.500 | 131K | 97.2% |
| Cloudflarefp8 | $0.293 | $2.25 | 24K | 100% |
| SambaNovabf16 | $0.450 | $0.900 | 16K | 99.6% |
| Groq | $0.590 | $0.790 | 131K | 99.9% |
| SambaNovabf16 | $0.600 | $1.20 | 131K | 99.3% |
| WandBfp16 | $0.710 | $0.710 | 128K | 100% |
| $0.720 | $0.720 | 128K | 95% | |
| $0.720 | $0.720 | 128K | 100% | |
| Togetherfp8 | $1.04 | $1.04 | 131K | 100% |
Llama 3.3 70B Instruct costs $0.100 per million input tokens and $0.320 per million output tokens via OpenRouter, making it 62nd cheapest of 298 paid models.
Llama 3.3 70B Instruct scores 14.5 on the Artificial Analysis Intelligence Index, ranking 148th of 178 benchmarked models, with a GPQA Diamond score of 50%.
Llama 3.3 70B Instruct generates around 115 tokens per second with 244ms time-to-first-token (p50), the 55th fastest tracked model.
Llama 3.3 70B Instruct supports a 131K-token context window and can output up to 16K tokens. It accepts text input.