A context window is the maximum number of tokens a large language model can attend to in a single forward pass, comprising the prompt, any retrieved content, prior conversation turns and the model's own generated output.
In practiceContext window size sets a hard limit on how much instruction, source material and conversational history a model can consider when producing a response. It is measured in tokens, not words, and is consumed by both input and output. Common pitfalls include assuming that a larger window guarantees better reasoning at the far end of the window, and conflating context window with long-term memory, which requires retrieval or fine-tuning rather than a single prompt.
A bank building a retrieval-augmented assistant for credit analysts caps each query at a fixed token budget for the retrieved policy excerpts, the chat history and the response, ensuring the prompt fits inside the model's context window.
This definition is maintained by Moweb partners and used in live client engagements. For how Context window applies to your estate, or to challenge a working definition, speak to a partner.