NVIDIA: Nemotron 3 Ultra

nvidia/nemotron-3-ultra-550b-a55b

ReasoningToolsJSON

Use via OpenRouter ↗

Intelligence

—

Design Elo

1189

Speed

—

tokens/sec

Latency

—

first token

Input price

$0.600

181st cheapest

Context

512K

How it compares

Cheaper than45%

of all ranked models

Overview

NVIDIA Nemotron 3 Ultra is an open frontier-reasoning and orchestration model from NVIDIA, with 55B active parameters out of 550B total (MoE). Built on a hybrid Transformer-Mamba mixture-of-experts architecture, it...

Benchmarks

independent · Artificial Analysis & Design Arena

Design Arena · Elo2,341 tournaments

1189

Data Viz

1152

Website

1127

svg

1125

Providers & pricing (4)

Provider	In $/M	Out $/M	Context	Uptime
DeepInfrafp4	$0.500	$2.20	262K	83.2%
BaseTenfp4	$0.600	$2.40	203K	100%
Together	$0.600	$3.60	512K	97.5%
Venicefp8	$0.625	$3.13	256K	90.4%

Specifications

Context window512K

Max output—

Knowledge cutoff—

Input modalitiestext

Output modalitiestext

Prompt caching—

Cache read price$0.200/M

ModeratedNo

Open weightsnvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16 ↗

Nemotron 3 Ultra FAQ

How much does Nemotron 3 Ultra cost?

Nemotron 3 Ultra costs $0.600 per million input tokens and $3.60 per million output tokens via OpenRouter, making it 181st cheapest of 332 paid models.