Nemotron 3.5 Content Safety (free) has the lowest latency of any NVIDIA model, responding in about 267ms to first token. Llama 3.3 Nemotron Super 49B V1.5 (318ms) and Nemotron 3 Nano Omni (free) (530ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Nemotron 3.5 Content Safety (free) has the lowest latency of any NVIDIA model, responding in about 267ms to first token. Llama 3.3 Nemotron Super 49B V1.5 (318ms) and Nemotron 3 Nano Omni (free) (530ms) round out the top three.
Llama 3.3 Nemotron Super 49B V1.5 (318ms) is the closest alternative on this metric, followed by Nemotron 3 Nano Omni (free) (530ms). See the full ranking above for the tradeoffs.
modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 11 of them qualify for this ranking.