modelgrep

Lowest-Latency OpenAI Models

Quick answer · Updated June 2026

gpt-oss-120b (free) has the lowest latency of any OpenAI model, responding in about 170ms to first token. gpt-oss-120b (170ms) and gpt-oss-safeguard-20b (253ms) round out the top three.

170msLatency
33.3Intelligence
345 t/sSpeed
FreeInput /M
131KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1O
    gpt-oss-120b:free
    ReasoningTools33.3 intel · Free/M · 345 t/s
    170ms
    Latency
  2. 2O
    gpt-oss-120b
    ReasoningToolsJSON33.3 intel · $0.039/M · 345 t/s
    170ms
    Latency
  3. 3O
    gpt-oss-safeguard-20b
    ReasoningToolsJSON$0.075/M · 527 t/s · 131K ctx
    253ms
    Latency
  4. 4O
    gpt-oss-20b:free
    ReasoningTools24.5 intel · Free/M · 292 t/s
    253ms
    Latency
  5. 5O
    gpt-oss-20b
    ReasoningToolsJSON24.5 intel · $0.029/M · 292 t/s
    253ms
    Latency
  6. 6O
    gpt-4
    ToolsJSON$30.00/M · 22 t/s · 8K ctx
    354ms
    Latency
  7. 7O
    gpt-4o-2024-05-13
    ToolsJSONVision$5.00/M · 33 t/s · 128K ctx
    459ms
    Latency
  8. 8O
    gpt-3.5-turbo
    ToolsJSON$0.500/M · 16 t/s · 16K ctx
    480ms
    Latency
  9. 9O
    gpt-4o
    ToolsJSONVision$2.50/M · 38 t/s · 128K ctx
    500ms
    Latency
  10. 10O
    gpt-audio-mini
    ToolsJSONAudio$0.600/M · 4 t/s · 128K ctx
    522ms
    Latency
  11. 11O
    gpt-4o-mini
    ToolsJSONVision$0.150/M · 35 t/s · 128K ctx
    538ms
    Latency
  12. 12O
    gpt-4o-mini-2024-07-18
    ToolsJSONVision$0.150/M · 53 t/s · 128K ctx
    541ms
    Latency
  13. 13O
    gpt-4o-2024-11-20
    ToolsJSONVision17.3 intel · $2.50/M · 45 t/s
    577ms
    Latency
  14. 14O
    gpt-3.5-turbo-16k
    ToolsJSON$3.00/M · 12 t/s · 16K ctx
    578ms
    Latency
  15. 15O
    gpt-5-chat
    JSONVision$1.25/M · 37 t/s · 128K ctx
    597ms
    Latency
  16. 16O
    gpt-3.5-turbo-0613
    ToolsJSON$1.00/M · 20 t/s · 4K ctx
    650ms
    Latency
  17. 17O
    gpt-5.4-nano
    ReasoningToolsJSON+144.0 intel · $0.200/M · 47 t/s
    678ms
    Latency
  18. 18O
    gpt-4.1-nano
    ToolsJSONVision13.0 intel · $0.100/M · 54 t/s
    694ms
    Latency
  19. 19O
    gpt-4.1-mini
    ToolsJSONVision22.9 intel · $0.400/M · 48 t/s
    699ms
    Latency
  20. 20O
    gpt-5.4-image-2
    ReasoningJSONVision+1$8.00/M · 33 t/s · 272K ctx
    708ms
    Latency
  21. 21O
    gpt-4.1
    ToolsJSONVision26.3 intel · $2.00/M · 45 t/s
    723ms
    Latency
  22. 22O
    gpt-5.4-mini
    ReasoningToolsJSON+123.3 intel · $0.750/M · 89 t/s
    811ms
    Latency
  23. 23O
    gpt-5.4
    ReasoningToolsJSON+156.8 intel · $2.50/M · 53 t/s
    966ms
    Latency
  24. 24O
    gpt-chat-latest
    ToolsJSONVision$5.00/M · 65 t/s · 400K ctx
    1.1s
    Latency
  25. 25O
    gpt-audio
    ToolsJSONAudio$2.50/M · 56 t/s · 128K ctx
    1.1s
    Latency

Frequently asked

Which OpenAI model has the lowest latency?

gpt-oss-120b (free) has the lowest latency of any OpenAI model, responding in about 170ms to first token. gpt-oss-120b (170ms) and gpt-oss-safeguard-20b (253ms) round out the top three.

What's a good alternative to gpt-oss-120b (free)?

gpt-oss-120b (170ms) is the closest alternative on this metric, followed by gpt-oss-safeguard-20b (253ms). See the full ranking above for the tradeoffs.

How many OpenAI models are there?

modelgrep tracks 62 OpenAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by GPT-5.4. 25 of them qualify for this ranking.

More OpenAI rankings

All rankings