modelgrep

Fastest Microsoft Models

Quick answer · Updated June 2026

The fastest Microsoft model is Phi 4 at 55 output tokens per second. Phi 4 Mini Instruct (18 t/s) and WizardLM-2 8x22B (12 t/s) round out the top three.

55 t/sSpeed
10.4Intelligence
$0.065Input /M
16KContext

AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.

  1. 1M
    phi-4
    JSON10.4 intel · $0.065/M · 258ms ttft
    55 t/s
    Speed
  2. 2M
    phi-4-mini-instruct
    JSON8.4 intel · $0.080/M · 379ms ttft
    18 t/s
    Speed
  3. 3M
    wizardlm-2-8x22b
    JSON$0.620/M · 967ms ttft · 66K ctx
    12 t/s
    Speed

Frequently asked

What is the fastest Microsoft model?

The fastest Microsoft model is Phi 4 at 55 output tokens per second. Phi 4 Mini Instruct (18 t/s) and WizardLM-2 8x22B (12 t/s) round out the top three.

What's a good alternative to Phi 4?

Phi 4 Mini Instruct (18 t/s) is the closest alternative on this metric, followed by WizardLM-2 8x22B (12 t/s). See the full ranking above for the tradeoffs.

How many Microsoft models are there?

modelgrep tracks 3 Microsoft models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Phi 4. 3 of them qualify for this ranking.

More Microsoft rankings

All rankings