Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Trinity Large Thinking has the lowest latency of any Arcee AI model, responding in about 525ms to first token. Trinity Mini (741ms) is next.
Trinity Mini (741ms) is the closest alternative on this metric. See the full ranking above for the tradeoffs.
modelgrep tracks 4 Arcee AI models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Trinity Large Thinking. 2 of them qualify for this ranking.