modelgrep

Small & Fast OpenAI Models

Quick answer · Updated June 2026

The small, fast OpenAI model is gpt-oss-safeguard-20b — the efficient tier at 542 tokens/sec and $0.075 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. gpt-oss-20b (344 t/s) and gpt-oss-120b (free) (281 t/s) round out the top three.

542 t/sSpeed
$0.075Input /M
131KContext

Compact, efficient models — the small/mini/flash/haiku tier — ranked by output speed. These trade a little raw intelligence for low cost and high throughput, which is the right tradeoff for chat, classification, extraction and other high-volume work.

  1. 1O
    gpt-oss-safeguard-20b
    ReasoningToolsJSON$0.075/M · 261ms ttft · 131K ctx
    542 t/s
    Speed
  2. 2O
    gpt-oss-20b
    ReasoningToolsJSON24.5 intel · $0.029/M · 229ms ttft
    344 t/s
    Speed
  3. 3O
    gpt-oss-120b:free
    ReasoningTools33.3 intel · Free/M · 183ms ttft
    281 t/s
    Speed
  4. 4O
    gpt-oss-120b
    ReasoningToolsJSON33.3 intel · $0.039/M · 183ms ttft
    281 t/s
    Speed
  5. 5O
    gpt-5.1-codex-mini
    ReasoningToolsJSON+138.6 intel · $0.250/M · 2.0s ttft
    135 t/s
    Speed
  6. 6O
    o3-mini
    ReasoningToolsJSON$1.10/M · 1.9s ttft · 200K ctx
    113 t/s
    Speed
  7. 7O
    gpt-5-image-mini
    ReasoningJSONVision+1$2.50/M · 8.2s ttft · 400K ctx
    106 t/s
    Speed
  8. 8O
    o4-mini-high
    ReasoningToolsJSON+1$1.10/M · 5.4s ttft · 200K ctx
    97 t/s
    Speed
  9. 9O
    gpt-5-nano
    ReasoningToolsJSON+125.9 intel · $0.050/M · 2.2s ttft
    91 t/s
    Speed
  10. 10O
    gpt-5.4-mini
    ReasoningToolsJSON+123.3 intel · $0.750/M · 761ms ttft
    72 t/s
    Speed
  11. 11O
    gpt-4.1-nano
    ToolsJSONVision13.0 intel · $0.100/M · 747ms ttft
    68 t/s
    Speed
  12. 12O
    o4-mini
    ReasoningToolsJSON+133.1 intel · $1.10/M · 2.8s ttft
    67 t/s
    Speed
  13. 13O
    gpt-5-mini
    ReasoningToolsJSON+138.9 intel · $0.250/M · 3.2s ttft
    66 t/s
    Speed
  14. 14O
    gpt-5.4-nano
    ReasoningToolsJSON+144.0 intel · $0.200/M · 610ms ttft
    63 t/s
    Speed
  15. 15O
    gpt-4o-mini-2024-07-18
    ToolsJSONVision$0.150/M · 554ms ttft · 128K ctx
    47 t/s
    Speed
  16. 16O
    gpt-4.1-mini
    ToolsJSONVision22.9 intel · $0.400/M · 722ms ttft
    47 t/s
    Speed
  17. 17O
    gpt-4o-mini
    ToolsJSONVision$0.150/M · 549ms ttft · 128K ctx
    31 t/s
    Speed
  18. 18O
    gpt-audio-mini
    ToolsJSONAudio$0.600/M · 637ms ttft · 128K ctx
    31 t/s
    Speed
  19. 19O
    gpt-4o-mini-search-preview
    JSON$0.150/M · 2.6s ttft · 128K ctx
    28 t/s
    Speed

Frequently asked

What is the smallest, fastest OpenAI model?

The small, fast OpenAI model is gpt-oss-safeguard-20b — the efficient tier at 542 tokens/sec and $0.075 per million input tokens. It trades a few points of raw intelligence for speed and cost, the right call for high-volume, latency-sensitive work. gpt-oss-20b (344 t/s) and gpt-oss-120b (free) (281 t/s) round out the top three.

What's a good alternative to gpt-oss-safeguard-20b?

gpt-oss-20b (344 t/s) is the closest alternative on this metric, followed by gpt-oss-120b (free) (281 t/s). See the full ranking above for the tradeoffs.

How many OpenAI models are there?

modelgrep tracks 62 OpenAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GPT-5.4. 19 of them qualify for this ranking.

More OpenAI rankings

All rankings