Hunyuan A13B Instruct has the lowest latency of any Tencent model, responding in about 1.1s to first token. Hy3 preview (4.4s) is next.
AI models ranked by time-to-first-token (p50). The most responsive large language models for real-time and interactive use cases.
Hunyuan A13B Instruct has the lowest latency of any Tencent model, responding in about 1.1s to first token. Hy3 preview (4.4s) is next.
Hy3 preview (4.4s) is the closest alternative on this metric. See the full ranking above for the tradeoffs.
modelgrep tracks 2 Tencent models with live benchmarks, speed, latency and per-provider pricing, led on intelligence by Hy3 preview. 2 of them qualify for this ranking.