modelgrep

Small & Fast NVIDIA Models

Quick answer · Updated June 2026

The small, fast NVIDIA model is Nemotron 3 Super (free) — the efficient tier at 238 tokens/sec and Free per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

238 t/sSpeed
36.0Intelligence
FreeInput /M
1MContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

  1. 1N
    nemotron-3-super-120b-a12b:free
    ReasoningToolsJSON36.0 intel · Free/M · 739ms ttft
    238 t/s
    Speed
  2. 2N
    nemotron-3-super-120b-a12b
    ReasoningToolsJSON36.0 intel · $0.090/M · 739ms ttft
    238 t/s
    Speed
  3. 3N
    nemotron-3-nano-30b-a3b:free
    ReasoningTools13.2 intel · Free/M · 651ms ttft
    159 t/s
    Speed
  4. 4N
    nemotron-3-nano-30b-a3b
    ReasoningToolsJSON13.2 intel · $0.050/M · 651ms ttft
    159 t/s
    Speed
  5. 5N
    nemotron-3-nano-omni-30b-a3b-reasoning:free
    ReasoningToolsVision+121.4 intel · Free/M · 624ms ttft
    125 t/s
    Speed
  6. 6N
    nemotron-nano-9b-v2:free
    ReasoningToolsJSON13.2 intel · Free/M · 981ms ttft
    44 t/s
    Speed
  7. 7N
    nemotron-nano-12b-v2-vl:free
    ReasoningToolsVision14.9 intel · Free/M · 2.0s ttft
    24 t/s
    Speed

Frequently asked

What is the smallest, fastest NVIDIA model?

The small, fast NVIDIA model is Nemotron 3 Super (free) — the efficient tier at 238 tokens/sec and Free per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. Nemotron 3 Super (238 t/s) and Nemotron 3 Nano 30B A3B (free) (159 t/s) round out the top three.

What's a good alternative to Nemotron 3 Super (free)?

Nemotron 3 Super (238 t/s) is the closest alternative on this metric, followed by Nemotron 3 Nano 30B A3B (free) (159 t/s). See the full ranking above for the tradeoffs.

How many NVIDIA models are there?

modelgrep tracks 11 NVIDIA models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Nemotron 3 Ultra (free). 7 of them qualify for this ranking.

More NVIDIA rankings

All rankings