Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.
UnslopNemo 12B (756ms) is the closest alternative on this metric, followed by Skyfall 36B V2 (819ms). See the full ranking above for the tradeoffs.
modelgrep tracks 4 TheDrummer models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.