Lowest-Latency NVIDIA Models

Quick answer · Updated June 2026

Nemotron 3.5 Content Safety (free) has the lowest latency of any NVIDIA model, responding in about 267ms to first token. Llama 3.3 Nemotron Super 49B V1.5 (318ms) and Nemotron 3 Nano Omni (free) (530ms) round out the top three.

267msLatency

80 t/sSpeed

FreeInput /M

128KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

1N
nemotron-3.5-content-safety:free
ReasoningVisionFree/M · 80 t/s · 128K ctx
267ms
Latency
2N
llama-3.3-nemotron-super-49b-v1.5
ReasoningToolsJSON14.6 intel · $0.400/M · 45 t/s
318ms
Latency
3N
nemotron-3-nano-omni-30b-a3b-reasoning:free
ReasoningToolsVision+121.4 intel · Free/M · 154 t/s
530ms
Latency
4N
nemotron-3-nano-30b-a3b:free
ReasoningTools13.2 intel · Free/M · 166 t/s
639ms
Latency
5N
nemotron-3-nano-30b-a3b
ReasoningToolsJSON13.2 intel · $0.050/M · 166 t/s
639ms
Latency
6N
nemotron-3-ultra-550b-a55b:free
ReasoningTools47.7 intel · Free/M · 53 t/s
852ms
Latency
7N
nemotron-3-ultra-550b-a55b
ReasoningToolsJSON47.7 intel · $0.500/M · 53 t/s
852ms
Latency
8N
nemotron-3-super-120b-a12b:free
ReasoningToolsJSON36.0 intel · Free/M · 249 t/s
954ms
Latency
9N
nemotron-3-super-120b-a12b
ReasoningToolsJSON36.0 intel · $0.090/M · 249 t/s
954ms
Latency
10N
nemotron-nano-9b-v2:free
ReasoningToolsJSON14.8 intel · Free/M · 41 t/s
1.0s
Latency
11N
nemotron-nano-12b-v2-vl:free
ReasoningToolsVision14.9 intel · Free/M · 25 t/s
1.5s
Latency

Frequently asked

Which NVIDIA model has the lowest latency?

What's a good alternative to Nemotron 3.5 Content Safety (free)?

Llama 3.3 Nemotron Super 49B V1.5 (318ms) is the closest alternative on this metric, followed by Nemotron 3 Nano Omni (free) (530ms). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 11 of them qualify for this ranking.

More NVIDIA rankings

NVIDIA: Smartest LLMs NVIDIA: Best LLMs for Coding NVIDIA: Best LLMs for Design & Frontend NVIDIA: Fastest LLMs NVIDIA: Cheapest LLMs NVIDIA: Best Free LLMs NVIDIA: Best Reasoning LLMs NVIDIA: Best Vision LLMs NVIDIA: Best LLMs for Agents NVIDIA: Best Open-Source LLMs NVIDIA: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs