modelgrep

Lowest-Latency Arcee AI Models

Quick answer · Updated June 2026

Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.

525msLatency
31.9Intelligence
273 t/sSpeed
$0.220Input /M
262KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1A
    trinity-large-thinking
    ReasoningToolsJSON31.9 intel · $0.220/M · 273 t/s
    525ms
    Latency
  2. 2A
    trinity-mini
    ReasoningToolsJSON$0.045/M · 36 t/s · 131K ctx
    741ms
    Latency

Frequently asked

Which Arcee AI model has the lowest latency?

Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.

What's a good alternative to Trinity Large Thinking?

Trinity Mini (741ms) is the closest alternative on this metric. See the full ranking above for the tradeoffs.

How many Arcee AI models are there?

modelgrep tracks 4 Arcee AI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Trinity Large Thinking. 2 of them qualify for this ranking.

More Arcee AI rankings

All rankings