modelgrep

Lowest-Latency Microsoft Models

Quick answer · Updated June 2026

Phi 4 has the lowest latency of any Microsoft model, responding in about 250ms to first token. Phi 4 Mini Instruct (527ms) and WizardLM-2 8x22B (977ms) round out the top three.

250msLatency
10.4Intelligence
43 t/sSpeed
$0.065Input /M
16KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1M
    phi-4
    JSON10.4 intel · $0.065/M · 43 t/s
    250ms
    Latency
  2. 2M
    phi-4-mini-instruct
    JSON8.4 intel · $0.080/M · 5 t/s
    527ms
    Latency
  3. 3M
    wizardlm-2-8x22b
    JSON$0.620/M · 12 t/s · 66K ctx
    977ms
    Latency

Frequently asked

Which Microsoft model has the lowest latency?

Phi 4 has the lowest latency of any Microsoft model, responding in about 250ms to first token. Phi 4 Mini Instruct (527ms) and WizardLM-2 8x22B (977ms) round out the top three.

What's a good alternative to Phi 4?

Phi 4 Mini Instruct (527ms) is the closest alternative on this metric, followed by WizardLM-2 8x22B (977ms). See the full ranking above for the tradeoffs.

How many Microsoft models are there?

modelgrep tracks 3 Microsoft models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Phi 4. 3 of them qualify for this ranking.

More Microsoft rankings

All rankings