modelgrep

Lowest-Latency TheDrummer Models

Quick answer · Updated June 2026

Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.

308msLatency
85 t/sSpeed
$0.170Input /M
33KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1T
    rocinante-12b
    ToolsJSON$0.170/M · 85 t/s · 33K ctx
    308ms
    Latency
  2. 2T
    unslopnemo-12b
    ToolsJSON$0.400/M · 58 t/s · 33K ctx
    756ms
    Latency
  3. 3T
    skyfall-36b-v2
    $0.550/M · 21 t/s · 33K ctx
    819ms
    Latency
  4. 4T
    cydonia-24b-v4.1
    $0.300/M · 11 t/s · 131K ctx
    1.4s
    Latency

Frequently asked

Which TheDrummer model has the lowest latency?

Rocinante 12B has the lowest latency of any TheDrummer model, responding in about 308ms to first token. UnslopNemo 12B (756ms) and Skyfall 36B V2 (819ms) round out the top three.

What's a good alternative to Rocinante 12B?

UnslopNemo 12B (756ms) is the closest alternative on this metric, followed by Skyfall 36B V2 (819ms). See the full ranking above for the tradeoffs.

How many TheDrummer models are there?

modelgrep tracks 4 TheDrummer models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More TheDrummer rankings

All rankings