Lowest-Latency TheDrummer Models

Quick answer · Updated June 2026

Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.

308msLatency

85 t/sSpeed

$0.170Input /M

33KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

Frequently asked

Which TheDrummer model has the lowest latency?

Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.

What's a good alternative to Rocinante 12B?

UnslopNemo 12B (756ms) is the closest alternative on this metric, followed by Skyfall 36B V2 (819ms). See the full ranking above for the tradeoffs.

How many TheDrummer models are there?

modelgrep tracks 4 TheDrummer models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More TheDrummer rankings

TheDrummer: Smartest LLMs TheDrummer: Best LLMs for Coding TheDrummer: Best LLMs for Design & Frontend TheDrummer: Fastest LLMs TheDrummer: Cheapest LLMs TheDrummer: Best Free LLMs TheDrummer: Best Reasoning LLMs TheDrummer: Best Vision LLMs TheDrummer: Best LLMs for Agents TheDrummer: Best Open-Source LLMs TheDrummer: Longest-Context LLMs

All rankings

Small & Fast LLMs Smartest LLMs Best LLMs for Coding Best LLMs for Design & Frontend Fastest LLMs Cheapest LLMs Best Free LLMs Best Reasoning LLMs Best Vision LLMs Best LLMs for Agents Best Open-Source LLMs Longest-Context LLMs