Lowest-Latency LLMs

Match · Updated July 2026

AI models ranked by time-to-first-token (p50). The most responsive, low-latency large language models for real-time and interactive use cases.

Benchmark data for this ranking is temporarily unavailable — check back shortly.

By maker

OpenAI Qwen Google Anthropic Mistral DeepSeek Z.ai NVIDIA

All rankings

Small & Fast LLMs Best Local LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs Best LLMs for Writing Best LLMs for Math & Science Best LLMs for RAG Best LLMs for SQL & Data Analysis Best LLMs for Roleplay Best Uncensored LLMs