modelgrep

Lowest-Latency NVIDIA Models

Quick answer · Updated June 2026

Nemotron 3.5 Content Safety (free) has the lowest latency of any NVIDIA model, responding in about 267ms to first token. Llama 3.3 Nemotron Super 49B V1.5 (318ms) and Nemotron 3 Nano Omni (free) (530ms) round out the top three.

267msLatency
80 t/sSpeed
FreeInput /M
128KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1N
    nemotron-3.5-content-safety:free
    ReasoningVisionFree/M · 80 t/s · 128K ctx
    267ms
    Latency
  2. 2N
    llama-3.3-nemotron-super-49b-v1.5
    ReasoningToolsJSON14.6 intel · $0.400/M · 45 t/s
    318ms
    Latency
  3. 3N
    nemotron-3-nano-omni-30b-a3b-reasoning:free
    ReasoningToolsVision+121.4 intel · Free/M · 154 t/s
    530ms
    Latency
  4. 4N
    nemotron-3-nano-30b-a3b:free
    ReasoningTools13.2 intel · Free/M · 166 t/s
    639ms
    Latency
  5. 5N
    nemotron-3-nano-30b-a3b
    ReasoningToolsJSON13.2 intel · $0.050/M · 166 t/s
    639ms
    Latency
  6. 6N
    nemotron-3-ultra-550b-a55b:free
    ReasoningTools47.7 intel · Free/M · 53 t/s
    852ms
    Latency
  7. 7N
    nemotron-3-ultra-550b-a55b
    ReasoningToolsJSON47.7 intel · $0.500/M · 53 t/s
    852ms
    Latency
  8. 8N
    nemotron-3-super-120b-a12b:free
    ReasoningToolsJSON36.0 intel · Free/M · 249 t/s
    954ms
    Latency
  9. 9N
    nemotron-3-super-120b-a12b
    ReasoningToolsJSON36.0 intel · $0.090/M · 249 t/s
    954ms
    Latency
  10. 10N
    nemotron-nano-9b-v2:free
    ReasoningToolsJSON14.8 intel · Free/M · 41 t/s
    1.0s
    Latency
  11. 11N
    nemotron-nano-12b-v2-vl:free
    ReasoningToolsVision14.9 intel · Free/M · 25 t/s
    1.5s
    Latency

Frequently asked

Which NVIDIA model has the lowest latency?

Nemotron 3.5 Content Safety (free) has the lowest latency of any NVIDIA model, responding in about 267ms to first token. Llama 3.3 Nemotron Super 49B V1.5 (318ms) and Nemotron 3 Nano Omni (free) (530ms) round out the top three.

What's a good alternative to Nemotron 3.5 Content Safety (free)?

Llama 3.3 Nemotron Super 49B V1.5 (318ms) is the closest alternative on this metric, followed by Nemotron 3 Nano Omni (free) (530ms). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 11 of them qualify for this ranking.

More NVIDIA rankings

All rankings