Lowest-Latency Sao10K Models

Quick answer · Updated June 2026

Llama 3 8B Lunaris has the lowest latency of any Sao10K model, responding in about 126ms to first token. Llama 3.1 Euryale 70B v2.2 (394ms) and Llama 3.1 70B Hanami x1 (641ms) round out the top three.

126msLatency

70 t/sSpeed

$0.040Input /M

8KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

Frequently asked

Which Sao10K model has the lowest latency?

What's a good alternative to Llama 3 8B Lunaris?

Llama 3.1 Euryale 70B v2.2 (394ms) is the closest alternative on this metric, followed by Llama 3.1 70B Hanami x1 (641ms). See the full ranking above for the tradeoffs.

How many Sao10K models are there?

modelgrep tracks 4 Sao10K models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More Sao10K rankings

Sao10K: Smartest LLMs Sao10K: Best LLMs for Coding Sao10K: Best LLMs for Design & Frontend Sao10K: Fastest LLMs Sao10K: Cheapest LLMs Sao10K: Best Free LLMs Sao10K: Best Reasoning LLMs Sao10K: Best Vision LLMs Sao10K: Best LLMs for Agents Sao10K: Best Open-Source LLMs Sao10K: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs