modelgrep

Reasoning models

LLMs that 'think' before answering — spending extra tokens on hidden chain-of-thought to solve harder problems at the cost of latency and price.

Reasoning models generate internal chains of thought before producing a final answer. This extra "thinking" dramatically improves performance on math, science, debugging and multi-step planning — the same base capability scores much higher when allowed to reason.

The trade-offs are real: thinking tokens are billed as output (so costs rise), time-to-first-visible-token grows from milliseconds to seconds, and for simple tasks the extra deliberation adds nothing. Many models now expose a reasoning-effort dial so you can tune the trade-off per request.

Use reasoning models for hard, high-value problems; use fast standard models for everyday completion, extraction and chat.