nvidia/nemotron-3-ultra-550b-a55b
NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...
| Provider | In $/M | Out $/M | Context | Uptime |
|---|---|---|---|---|
| DeepInfrabf16 | $0.500 | $2.50 | 262K | 97.6% |
| Together | $0.600 | $3.60 | 512K | 99.7% |
Nemotron 3 Ultra costs $0.500 per million input tokens and $2.50 per million output tokens via OpenRouter, making it 160th cheapest of 298 paid models.
Nemotron 3 Ultra scores 47.7 on the Artificial Analysis Intelligence Index, ranking 19th of 178 benchmarked models, with a GPQA Diamond score of 87%.
Nemotron 3 Ultra generates around 112 tokens per second with 746ms time-to-first-token (p50), the 57th fastest tracked model.
Nemotron 3 Ultra supports a 1M-token context window and can output up to 16K tokens. It accepts text input.