A sampling setting (typically 0–2) that controls randomness: low is deterministic and focused, high is varied and creative.
Temperature scales how sharply a model favors its most likely next token. Near 0 it picks the top choice almost every time — repeatable, focused, ideal for extraction, classification, code and anything with a single right answer. Higher values flatten the distribution, adding variety useful for brainstorming and creative writing.
There's no universally "correct" value — it's per-task. A rule of thumb: 0–0.3 for structured or factual work, 0.7–1.0 for open-ended generation. If you need reproducible outputs (tests, evals, caching), pin it low.
Temperature is independent of the model: the same setting behaves differently across models, and reasoning models often manage their own internal sampling, making the dial less impactful for them.