modelgrep
N

NVIDIA: Nemotron 3 Ultra

nvidia/nemotron-3-ultra-550b-a55b

19th smartest of 178ReasoningToolsJSON
Use via OpenRouter ↗
Intelligence
47.7
19th of 178
Design Elo
1223
Game Dev
Speed
112
57th fastest
Latency
746ms
first token
Input price
$0.500
160th cheapest
Context
1M
16K max out

How it compares

Smarter than89%
of all ranked models
Faster than81%
of all ranked models
Cheaper than46%
of all ranked models

Overview

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Benchmarks

independent · via OpenRouter
Artificial Analysis87th percentile
Intelligence Index
47.7
Coding Index
37.6
Agentic Index
57.1
GPQA Diamond
87%
Humanity's Last Exam
27%
SciCode
40%
Tau²-Bench (agentic)
83%
Design Arena · Elo365 tournaments
Game Dev
1223
Website
1136

Providers & pricing (2)

ProviderIn $/MOut $/MUptime
DeepInfrabf16$0.500$2.5097.6%
Together$0.600$3.6099.7%

Specifications

Context window1M
Max output16K
Knowledge cutoff
Input modalitiestext
Output modalitiestext
Prompt caching
Cache read price$0.150/M
ModeratedNo

Nemotron 3 Ultra FAQ

How much does Nemotron 3 Ultra cost?

Nemotron 3 Ultra costs $0.500 per million input tokens and $2.50 per million output tokens via OpenRouter, making it 160th cheapest of 298 paid models.

How smart is Nemotron 3 Ultra?

Nemotron 3 Ultra scores 47.7 on the Artificial Analysis Intelligence Index, ranking 19th of 178 benchmarked models, with a GPQA Diamond score of 87%.

How fast is Nemotron 3 Ultra?

Nemotron 3 Ultra generates around 112 tokens per second with 746ms time-to-first-token (p50), the 57th fastest tracked model.

What is Nemotron 3 Ultra's context window?

Nemotron 3 Ultra supports a 1M-token context window and can output up to 16K tokens. It accepts text input.

Compare head-to-head