modelgrep

Lowest-Latency AI21 Models

Quick answer · Updated June 2026

Jamba Large 1.7 has the lowest latency of any AI21 model, responding in about 717ms to first token.

717msLatency
10.9Intelligence
15 t/sSpeed
$2.00Input /M
256KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1A
    jamba-large-1.7
    ToolsJSON10.9 intel · $2.00/M · 15 t/s
    717ms
    Latency

Frequently asked

Which AI21 model has the lowest latency?

Jamba Large 1.7 has the lowest latency of any AI21 model, responding in about 717ms to first token.

How many AI21 models are there?

modelgrep tracks 1 AI21 models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Jamba Large 1.7. 1 of them qualify for this ranking.

More AI21 rankings

All rankings