modelgrep

Lowest-Latency Sao10K Models

Quick answer · Updated June 2026

Llama 3 8B Lunaris has the lowest latency of any Sao10K model, responding in about 126ms to first token. Llama 3.1 Euryale 70B v2.2 (394ms) and Llama 3.1 70B Hanami x1 (641ms) round out the top three.

126msLatency
70 t/sSpeed
$0.040Input /M
8KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1S
    l3-lunaris-8b
    JSON$0.040/M · 70 t/s · 8K ctx
    126ms
    Latency
  2. 2S
    l3.1-euryale-70b
    ToolsJSON$0.850/M · 50 t/s · 131K ctx
    394ms
    Latency
  3. 3S
    l3.1-70b-hanami-x1
    $3.00/M · 4 t/s · 16K ctx
    641ms
    Latency
  4. 4S
    l3.3-euryale-70b
    JSON$0.650/M · 11 t/s · 131K ctx
    2.1s
    Latency

Frequently asked

Which Sao10K model has the lowest latency?

Llama 3 8B Lunaris has the lowest latency of any Sao10K model, responding in about 126ms to first token. Llama 3.1 Euryale 70B v2.2 (394ms) and Llama 3.1 70B Hanami x1 (641ms) round out the top three.

What's a good alternative to Llama 3 8B Lunaris?

Llama 3.1 Euryale 70B v2.2 (394ms) is the closest alternative on this metric, followed by Llama 3.1 70B Hanami x1 (641ms). See the full ranking above for the tradeoffs.

How many Sao10K models are there?

modelgrep tracks 4 Sao10K models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More Sao10K rankings

All rankings