modelgrep

Fastest Sao10K Models

Quick answer · Updated June 2026

The fastest Sao10K model is Llama 3 8B Lunaris at 74 output tokens per second. Llama 3.1 Euryale 70B v2.2 (37 t/s) and Llama 3.3 Euryale 70B (9 t/s) round out the top three.

74 t/sSpeed
$0.040Input /M
8KContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

  1. 1S
    l3-lunaris-8b
    JSON$0.040/M · 133ms ttft · 8K ctx
    74 t/s
    Speed
  2. 2S
    l3.1-euryale-70b
    ToolsJSON$0.850/M · 281ms ttft · 131K ctx
    37 t/s
    Speed
  3. 3S
    l3.3-euryale-70b
    JSON$0.650/M · 977ms ttft · 131K ctx
    9 t/s
    Speed
  4. 4S
    l3.1-70b-hanami-x1
    $3.00/M · 861ms ttft · 16K ctx
    5 t/s
    Speed

Frequently asked

What is the fastest Sao10K model?

The fastest Sao10K model is Llama 3 8B Lunaris at 74 output tokens per second. Llama 3.1 Euryale 70B v2.2 (37 t/s) and Llama 3.3 Euryale 70B (9 t/s) round out the top three.

What's a good alternative to Llama 3 8B Lunaris?

Llama 3.1 Euryale 70B v2.2 (37 t/s) is the closest alternative on this metric, followed by Llama 3.3 Euryale 70B (9 t/s). See the full ranking above for the tradeoffs.

How many Sao10K models are there?

modelgrep tracks 4 Sao10K models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More Sao10K rankings

All rankings