Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Llama Guard 4 12B has the lowest latency of any Meta model, responding in about 120ms to first token. Llama 3.1 8B Instruct (143ms) and Llama 3.2 11B Vision Instruct (164ms) round out the top three.
Llama 3.1 8B Instruct (143ms) is the closest alternative on this metric, followed by Llama 3.2 11B Vision Instruct (164ms). See the full ranking above for the tradeoffs.
modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 13 of them qualify for this ranking.