modelgrep

Lowest-Latency Cohere Models

Quick answer · Updated June 2026

Command R7B (12-2024) has the lowest latency of any Cohere model, responding in about 220ms to first token. Command R+ (08-2024) (280ms) and Command A (286ms) round out the top three.

220msLatency
55 t/sSpeed
$0.037Input /M
128KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1C
    command-r7b-12-2024
    JSON$0.037/M · 55 t/s · 128K ctx
    220ms
    Latency
  2. 2C
    command-r-plus-08-2024
    ToolsJSON$2.50/M · 18 t/s · 128K ctx
    280ms
    Latency
  3. 3C
    command-a
    JSON$2.50/M · 43 t/s · 256K ctx
    286ms
    Latency
  4. 4C
    command-r-08-2024
    ToolsJSON$0.150/M · 35 t/s · 128K ctx
    290ms
    Latency

Frequently asked

Which Cohere model has the lowest latency?

Command R7B (12-2024) has the lowest latency of any Cohere model, responding in about 220ms to first token. Command R+ (08-2024) (280ms) and Command A (286ms) round out the top three.

What's a good alternative to Command R7B (12-2024)?

Command R+ (08-2024) (280ms) is the closest alternative on this metric, followed by Command A (286ms). See the full ranking above for the tradeoffs.

How many Cohere models are there?

modelgrep tracks 4 Cohere models with live benchmarks, speed, latency and per-provider pricing. 4 of them qualify for this ranking.

More Cohere rankings

All rankings