modelgrep

Longest-Context NVIDIA Models

Quick answer · Updated June 2026

Nemotron 3 Ultra (free) has the largest context window of any NVIDIA model, at 1M tokens. Nemotron 3 Ultra (1M) and Nemotron 3 Super (free) (1M) round out the top three.

1MContext
47.7Intelligence
84 t/sSpeed
FreeInput /M

AI models with the largest context windows, ranked by token capacity. The best large language models for long documents, codebases and extended conversations.

  1. 1N
    nemotron-3-ultra-550b-a55b:free
    ReasoningTools47.7 intel · Free/M · 84 t/s
    1M
    Context
  2. 2N
    nemotron-3-ultra-550b-a55b
    ReasoningToolsJSON47.7 intel · $0.500/M · 84 t/s
    1M
    Context
  3. 3N
    nemotron-3-super-120b-a12b:free
    ReasoningToolsJSON36.0 intel · Free/M · 238 t/s
    1M
    Context
  4. 4N
    nemotron-3-super-120b-a12b
    ReasoningToolsJSON36.0 intel · $0.090/M · 238 t/s
    1M
    Context
  5. 5N
    nemotron-3-nano-30b-a3b
    ReasoningToolsJSON13.2 intel · $0.050/M · 159 t/s
    262K
    Context
  6. 6N
    nemotron-3-nano-omni-30b-a3b-reasoning:free
    ReasoningToolsVision+121.4 intel · Free/M · 125 t/s
    256K
    Context
  7. 7N
    nemotron-3-nano-30b-a3b:free
    ReasoningTools13.2 intel · Free/M · 159 t/s
    256K
    Context

Frequently asked

Which NVIDIA model has the largest context window?

Nemotron 3 Ultra (free) has the largest context window of any NVIDIA model, at 1M tokens. Nemotron 3 Ultra (1M) and Nemotron 3 Super (free) (1M) round out the top three.

What's a good alternative to Nemotron 3 Ultra (free)?

Nemotron 3 Ultra (1M) is the closest alternative on this metric, followed by Nemotron 3 Super (free) (1M). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 7 of them qualify for this ranking.

More NVIDIA rankings

All rankings