modelgrep

Lowest-Latency Perplexity Models

Quick answer · Updated June 2026

Sonar has the lowest latency of any Perplexity model, responding in about 1.6s to first token. Sonar Pro (2.2s) and Sonar Pro Search (4.1s) round out the top three.

1.6sLatency
43 t/sSpeed
$1.00Input /M
127KContext

AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.

  1. 1P
    sonar
    Vision$1.00/M · 43 t/s · 127K ctx
    1.6s
    Latency
  2. 2P
    sonar-pro
    Vision$3.00/M · 81 t/s · 200K ctx
    2.2s
    Latency
  3. 3P
    sonar-pro-search
    ReasoningJSONVision$3.00/M · 55 t/s · 200K ctx
    4.1s
    Latency
  4. 4P
    sonar-reasoning-pro
    ReasoningVision$2.00/M · 25 t/s · 128K ctx
    15.9s
    Latency
  5. 5P
    sonar-deep-research
    Reasoning$2.00/M · 29 t/s · 128K ctx
    38.2s
    Latency

Frequently asked

Which Perplexity model has the lowest latency?

Sonar has the lowest latency of any Perplexity model, responding in about 1.6s to first token. Sonar Pro (2.2s) and Sonar Pro Search (4.1s) round out the top three.

What's a good alternative to Sonar?

Sonar Pro (2.2s) is the closest alternative on this metric, followed by Sonar Pro Search (4.1s). See the full ranking above for the tradeoffs.

How many Perplexity models are there?

modelgrep tracks 5 Perplexity models with live benchmarks, speed, latency and per-provider pricing. 5 of them qualify for this ranking.

More Perplexity rankings

All rankings