modelgrep

Lowest-Latency xAI Models

Quick answer · Updated June 2026

Grok 4.3 has the lowest latency of any xAI model, responding in about 675ms to first token. Grok 4.20 (707ms) and Grok Build 0.1 (1.1s) round out the top three.

675msLatency
53.2Intelligence
127 t/sSpeed
$1.25Input /M
1MContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1X
    grok-4.3
    ReasoningToolsJSON+153.2 intel · $1.25/M · 127 t/s
    675ms
    Latency
  2. 2X
    grok-4.20
    ReasoningToolsJSON+129.7 intel · $1.25/M · 78 t/s
    707ms
    Latency
  3. 3X
    grok-build-0.1
    ReasoningToolsJSON+1$1.00/M · 123 t/s · 256K ctx
    1.1s
    Latency
  4. 4X
    grok-4.20-multi-agent
    ReasoningJSONVision$1.25/M · 336 t/s · 2M ctx
    11.0s
    Latency

Frequently asked

Which xAI model has the lowest latency?

Grok 4.3 has the lowest latency of any xAI model, responding in about 675ms to first token. Grok 4.20 (707ms) and Grok Build 0.1 (1.1s) round out the top three.

What's a good alternative to Grok 4.3?

Grok 4.20 (707ms) is the closest alternative on this metric, followed by Grok Build 0.1 (1.1s). See the full ranking above for the tradeoffs.

How many xAI models are there?

modelgrep tracks 4 xAI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Grok 4.3. 4 of them qualify for this ranking.

More xAI rankings

All rankings