modelgrep

Fastest NVIDIA Models

Quick answer · Updated June 2026

The fastest NVIDIA model is Nemotron 3 Super (free) at 238 output tokens per second. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

238 t/sSpeed
36.0Intelligence
FreeInput /M
1MContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

  1. 1N
    nemotron-3-super-120b-a12b:free
    ReasoningToolsJSON36.0 intel · Free/M · 739ms ttft
    238 t/s
    Speed
  2. 2N
    nemotron-3-super-120b-a12b
    ReasoningToolsJSON36.0 intel · $0.090/M · 739ms ttft
    238 t/s
    Speed
  3. 3N
    nemotron-3-nano-30b-a3b:free
    ReasoningTools13.2 intel · Free/M · 651ms ttft
    159 t/s
    Speed
  4. 4N
    nemotron-3-nano-30b-a3b
    ReasoningToolsJSON13.2 intel · $0.050/M · 651ms ttft
    159 t/s
    Speed
  5. 5N
    nemotron-3-nano-omni-30b-a3b-reasoning:free
    ReasoningToolsVision+121.4 intel · Free/M · 624ms ttft
    125 t/s
    Speed
  6. 6N
    nemotron-3-ultra-550b-a55b:free
    ReasoningTools47.7 intel · Free/M · 720ms ttft
    84 t/s
    Speed
  7. 7N
    nemotron-3-ultra-550b-a55b
    ReasoningToolsJSON47.7 intel · $0.500/M · 720ms ttft
    84 t/s
    Speed
  8. 8N
    llama-3.3-nemotron-super-49b-v1.5
    ReasoningToolsJSON14.6 intel · $0.400/M · 170ms ttft
    44 t/s
    Speed
  9. 9N
    nemotron-nano-9b-v2:free
    ReasoningToolsJSON13.2 intel · Free/M · 981ms ttft
    44 t/s
    Speed
  10. 10N
    nemotron-nano-12b-v2-vl:free
    ReasoningToolsVision14.9 intel · Free/M · 2.0s ttft
    24 t/s
    Speed

Frequently asked

What is the fastest NVIDIA model?

The fastest NVIDIA model is Nemotron 3 Super (free) at 238 output tokens per second. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

What's a good alternative to Nemotron 3 Super (free)?

Nemotron 3 Super (238 t/s) is the closest alternative on this metric, followed by Nemotron 3 Nano 30B A3B (free) (159 t/s). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 10 of them qualify for this ranking.

More NVIDIA rankings

All rankings