modelgrep

Lowest-Latency Meta Models

Quick answer · Updated June 2026

Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.

120msLatency
18 t/sSpeed
$0.180Input /M
164KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1M
    llama-guard-4-12b
    JSONVision$0.180/M · 18 t/s · 164K ctx
    120ms
    Latency
  2. 2M
    llama-3.1-8b-instruct
    ToolsJSON11.8 intel · $0.020/M · 145 t/s
    143ms
    Latency
  3. 3M
    llama-3.2-11b-vision-instruct
    JSONVision8.7 intel · $0.345/M · 35 t/s
    164ms
    Latency
  4. 4M
    llama-3.2-3b-instruct:free
    Free/M · 102 t/s · 131K ctx
    223ms
    Latency
  5. 5M
    llama-3.2-3b-instruct
    $0.051/M · 102 t/s · 131K ctx
    223ms
    Latency
  6. 6M
    llama-3.3-70b-instruct:free
    Tools14.5 intel · Free/M · 115 t/s
    244ms
    Latency
  7. 7M
    llama-3.3-70b-instruct
    ToolsJSON14.5 intel · $0.100/M · 115 t/s
    244ms
    Latency
  8. 8M
    llama-4-scout
    ToolsJSONVision13.5 intel · $0.100/M · 130 t/s
    249ms
    Latency
  9. 9M
    llama-4-maverick
    ToolsJSONVision18.4 intel · $0.150/M · 72 t/s
    303ms
    Latency
  10. 10M
    llama-3.1-70b-instruct
    ToolsJSON12.5 intel · $0.400/M · 28 t/s
    303ms
    Latency
  11. 11M
    llama-3.2-1b-instruct
    6.3 intel · $0.027/M · 169 t/s
    332ms
    Latency
  12. 12M
    llama-3-8b-instruct
    6.4 intel · $0.140/M · 63 t/s
    660ms
    Latency
  13. 13M
    llama-3-70b-instruct
    JSON8.9 intel · $0.510/M · 18 t/s
    1.3s
    Latency

Frequently asked

Which Meta model has the lowest latency?

Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.

What's a good alternative to Llama Guard 4 12B?

Llama 3.1 8B Instruct (143ms) is the closest alternative on this metric, followed by Llama 3.2 11B Vision Instruct (164ms). See the full ranking above for the tradeoffs.

How many Meta models are there?

modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 13 of them qualify for this ranking.

More Meta rankings

All rankings