Context window

The maximum amount of text (measured in tokens) an LLM can consider at once — its working memory for a single request.

A model's context window is the total number of tokens it can process in one request: your prompt, any documents you include, the conversation history, and the model's own output all share this budget. A 128K-token window fits roughly 300 pages of text; a 1M-token window fits a small codebase.

Bigger isn't automatically better. Models often reason less reliably over information buried deep in a long context (the "lost in the middle" effect), long prompts cost more (you pay per input token), and latency grows with input size. The practical question is whether the model can actually use its full window, not just accept it.

When choosing a model, match the window to the job: simple chat needs 8–32K, document analysis 100K+, and whole-repo coding or multi-document research benefits from 200K to 1M.

Longest-context LLMs →LLM leaderboard →

More terms

Tokens per second (throughput) →Time to first token (latency) →Artificial Analysis Intelligence Index →GPQA (Diamond) →Elo rating (for LLMs) →Prompt caching →