The fastest Meta model is Llama 3.2 1B Instruct at 169 output tokens per second. Llama 3.1 8B Instruct (145 t/s) and Llama 4 Scout (130 t/s) round out the top three.
AI models ranked by output speed (tokens per second, p50). The fastest large language models for low-latency and high-throughput applications.
The fastest Meta model is Llama 3.2 1B Instruct at 169 output tokens per second. Llama 3.1 8B Instruct (145 t/s) and Llama 4 Scout (130 t/s) round out the top three.
Llama 3.1 8B Instruct (145 t/s) is the closest alternative on this metric, followed by Llama 4 Scout (130 t/s). See the full ranking above for the tradeoffs.
modelgrep tracks 13 Meta models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Llama 4 Maverick. 13 of them qualify for this ranking.