Temperature

A sampling setting (typically 0–2) that controls randomness: low is deterministic and focused, high is varied and creative.

Temperature scales how sharply a model favors its most likely next token. Near 0 it picks the top choice almost every time — repeatable, focused, ideal for extraction, classification, code and anything with a single right answer. Higher values flatten the distribution, adding variety useful for brainstorming and creative writing.

There's no universally "correct" value — it's per-task. A rule of thumb: 0–0.3 for structured or factual work, 0.7–1.0 for open-ended generation. If you need reproducible outputs (tests, evals, caching), pin it low.

Temperature is independent of the model: the same setting behaves differently across models, and reasoning models often manage their own internal sampling, making the dial less impactful for them.

Reasoning models →How to choose an LLM →

More terms

Context window →Tokens per second (throughput) →Time to first token (latency) →Artificial Analysis Intelligence Index →GPQA (Diamond) →Elo rating (for LLMs) →