Fastest NVIDIA Models

Quick answer · Updated June 2026

The fastest NVIDIA model is Nemotron 3 Super (free) at 238 output tokens per second. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

238 t/sSpeed

36.0Intelligence

FreeInput /M

1MContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

1N
nemotron-3-super-120b-a12b:free
ReasoningToolsJSON36.0 intel · Free/M · 739ms ttft
238 t/s
Speed
2N
nemotron-3-super-120b-a12b
ReasoningToolsJSON36.0 intel · $0.090/M · 739ms ttft
238 t/s
Speed
3N
nemotron-3-nano-30b-a3b:free
ReasoningTools13.2 intel · Free/M · 651ms ttft
159 t/s
Speed
4N
nemotron-3-nano-30b-a3b
ReasoningToolsJSON13.2 intel · $0.050/M · 651ms ttft
159 t/s
Speed
5N
nemotron-3-nano-omni-30b-a3b-reasoning:free
ReasoningToolsVision+121.4 intel · Free/M · 624ms ttft
125 t/s
Speed
6N
nemotron-3-ultra-550b-a55b:free
ReasoningTools47.7 intel · Free/M · 720ms ttft
84 t/s
Speed
7N
nemotron-3-ultra-550b-a55b
ReasoningToolsJSON47.7 intel · $0.500/M · 720ms ttft
84 t/s
Speed
8N
llama-3.3-nemotron-super-49b-v1.5
ReasoningToolsJSON14.6 intel · $0.400/M · 170ms ttft
44 t/s
Speed
9N
nemotron-nano-9b-v2:free
ReasoningToolsJSON13.2 intel · Free/M · 981ms ttft
44 t/s
Speed
10N
nemotron-nano-12b-v2-vl:free
ReasoningToolsVision14.9 intel · Free/M · 2.0s ttft
24 t/s
Speed

Frequently asked

What is the fastest NVIDIA model?

The fastest NVIDIA model is Nemotron 3 Super (free) at 238 output tokens per second. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

What's a good alternative to Nemotron 3 Super (free)?

Nemotron 3 Super (238 t/s) is the closest alternative on this metric, followed by Nemotron 3 Nano 30B A3B (free) (159 t/s). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 10 of them qualify for this ranking.

More NVIDIA rankings

NVIDIA: Smartest LLMs NVIDIA: Best LLMs for Coding NVIDIA: Best LLMs for Design & Frontend NVIDIA: Lowest-Latency LLMs NVIDIA: Cheapest LLMs NVIDIA: Best Free LLMs NVIDIA: Best Reasoning LLMs NVIDIA: Best Vision LLMs NVIDIA: Best LLMs for Agents NVIDIA: Best Open-Source LLMs NVIDIA: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Lowest-Latency LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs