nvidia/llama-3.3-nemotron-super-49b-v1.5
Llama-3.3-Nemotron-Super-49B-v1.5 is a 49B-parameter, English-centric reasoning/chat model derived from Meta’s Llama-3.3-70B-Instruct with a 128K context. It’s post-trained for agentic workflows (RAG, tool calling) via SFT across math, code, science, and...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DeepInfrafp8 | $0.400 | $0.400 | 131K | — |
Llama 3.3 Nemotron Super 49B V1.5 costs $0.400 per million input tokens and $0.400 per million output tokens via OpenRouter, making it 147th cheapest of 298 paid models.
Llama 3.3 Nemotron Super 49B V1.5 scores 18.7 on the Artificial Analysis Intelligence Index, ranking 125th of 178 benchmarked models, with a GPQA Diamond score of 75%.
Llama 3.3 Nemotron Super 49B V1.5 generates around 49 tokens per second with 168ms time-to-first-token (p50), the 167th fastest tracked model.
Llama 3.3 Nemotron Super 49B V1.5 supports a 131K-token context window and can output up to 16K tokens. It accepts text input.