modelgrep

Lowest-Latency Mistral Models

Quick answer · Updated June 2026

Codestral 2508 has the lowest latency of any Mistral model, responding in about 136ms to first token. Ministral 3 3B 2512 (191ms) and Ministral 3 8B 2512 (241ms) round out the top three.

136msLatency
152 t/sSpeed
$0.300Input /M
256KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1M
    codestral-2508
    ToolsJSON$0.300/M · 152 t/s · 256K ctx
    136ms
    Latency
  2. 2M
    ministral-3b-2512
    ToolsJSONVision11.2 intel · $0.100/M · 65 t/s
    191ms
    Latency
  3. 3M
    ministral-8b-2512
    ToolsJSONVision14.8 intel · $0.150/M · 11 t/s
    241ms
    Latency
  4. 4M
    voxtral-small-24b-2507
    ToolsJSONAudio$0.100/M · 50 t/s · 32K ctx
    265ms
    Latency
  5. 5M
    ministral-14b-2512
    ToolsJSONVision16.0 intel · $0.200/M · 59 t/s
    271ms
    Latency
  6. 6M
    mistral-nemo
    ToolsJSON$0.020/M · 80 t/s · 131K ctx
    273ms
    Latency
  7. 7M
    mistral-small-24b-instruct-2501
    JSON$0.050/M · 40 t/s · 33K ctx
    283ms
    Latency
  8. 8M
    mistral-saba
    ToolsJSON$0.200/M · 32 t/s · 33K ctx
    288ms
    Latency
  9. 9M
    mixtral-8x22b-instruct
    ToolsJSON$2.00/M · 121 t/s · 66K ctx
    321ms
    Latency
  10. 10M
    mistral-small-3.2-24b-instruct
    ToolsJSONVision$0.075/M · 84 t/s · 128K ctx
    330ms
    Latency
  11. 11M
    mistral-small-2603
    ReasoningToolsJSON+118.6 intel · $0.150/M · 88 t/s
    334ms
    Latency
  12. 12M
    mistral-small-3.1-24b-instruct
    Vision$0.351/M · 32 t/s · 128K ctx
    494ms
    Latency
  13. 13M
    mistral-medium-3
    ToolsJSONVision18.8 intel · $0.400/M · 30 t/s
    553ms
    Latency
  14. 14M
    mistral-large
    ToolsJSON$2.00/M · 37 t/s · 128K ctx
    587ms
    Latency
  15. 15M
    mistral-large-2407
    ToolsJSON$2.00/M · 41 t/s · 131K ctx
    593ms
    Latency
  16. 16M
    mistral-large-2512
    ToolsJSONVision22.8 intel · $0.500/M · 40 t/s
    646ms
    Latency
  17. 17M
    devstral-2512
    ToolsJSON22.0 intel · $0.400/M · 11 t/s
    924ms
    Latency
  18. 18M
    mistral-medium-3.1
    ToolsJSONVision21.3 intel · $0.400/M · 35 t/s
    927ms
    Latency
  19. 19M
    mistral-medium-3-5
    ReasoningToolsJSON+139.2 intel · $1.50/M · 34 t/s
    2.6s
    Latency

Frequently asked

Which Mistral model has the lowest latency?

Codestral 2508 has the lowest latency of any Mistral model, responding in about 136ms to first token. Ministral 3 3B 2512 (191ms) and Ministral 3 8B 2512 (241ms) round out the top three.

What's a good alternative to Codestral 2508?

Ministral 3 3B 2512 (191ms) is the closest alternative on this metric, followed by Ministral 3 8B 2512 (241ms). See the full ranking above for the tradeoffs.

How many Mistral models are there?

modelgrep tracks 19 Mistral models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Mistral Medium 3.5. 19 of them qualify for this ranking.

More Mistral rankings

All rankings