Lowest-Latency Arcee AI Models

Quick answer · Updated June 2026

Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.

525msLatency

31.9Intelligence

273 t/sSpeed

$0.220Input /M

262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

1A
trinity-large-thinking
ReasoningToolsJSON31.9 intel · $0.220/M · 273 t/s
525ms
Latency
2A
trinity-mini
ReasoningToolsJSON$0.045/M · 36 t/s · 131K ctx
741ms
Latency

Frequently asked

Which Arcee AI model has the lowest latency?

Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.

What's a good alternative to Trinity Large Thinking?

Trinity Mini (741ms) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many Arcee AI models are there?

modelgrep tracks 4 Arcee AI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Trinity Large Thinking. 2 of them qualify for this ranking.

More Arcee AI rankings

Arcee AI: Smartest LLMs Arcee AI: Best LLMs for Coding Arcee AI: Best LLMs for Design & Frontend Arcee AI: Fastest LLMs Arcee AI: Cheapest LLMs Arcee AI: Best Free LLMs Arcee AI: Best Reasoning LLMs Arcee AI: Best Vision LLMs Arcee AI: Best LLMs for Agents Arcee AI: Best Open-Source LLMs Arcee AI: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs