Lowest-Latency Meta Models

Quick answer · Updated June 2026

Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.

120msLatency

18 t/sSpeed

$0.180Input /M

164KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

Frequently asked

Which Meta model has the lowest latency?

Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.

What's a good alternative to Llama Guard 4 12B?

Llama 3.1 8B Instruct (143ms) is the closest alternative on this metric, followed by Llama 3.2 11B Vision Instruct (164ms). See the full ranking above for the tradeoffs.

How many Meta models are there?

modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 13 of them qualify for this ranking.

More Meta rankings

Meta: Smartest LLMs Meta: Best LLMs for Coding Meta: Best LLMs for Design & Frontend Meta: Fastest LLMs Meta: Cheapest LLMs Meta: Best Free LLMs Meta: Best Reasoning LLMs Meta: Best Vision LLMs Meta: Best LLMs for Agents Meta: Best Open-Source LLMs Meta: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs