Context Window
AI & MLContext Window is the maximum amount of text, tokens, or other input a language model can consider at once when generating an output. It includes the prompt, system instructions, conversation history, and any retrieved documents. When the limit is reached, older content may be truncated or summarized, which can affect accuracy, continuity, and the model’s ability to follow long, detailed requirements.
How It Works
A context window is measured in tokens, not characters or words. Tokens are chunks of text (often parts of words), so the same paragraph can consume different token counts depending on language, formatting, and punctuation. The model processes everything inside the window as a single working set: instructions, user messages, tool outputs, and any pasted or retrieved content all compete for the same limited space.
When inputs exceed the window, an application must decide what to keep. Common strategies include dropping the oldest messages, summarizing earlier turns, or using retrieval-augmented generation (RAG) to fetch only the most relevant snippets from a database. The effective window also includes the model’s output: if you request a long answer, fewer tokens remain for input, which can force more aggressive truncation and reduce the model’s ability to reference earlier details.
Why It Matters for Web Hosting
If you host AI features such as chat support, content drafting, or code assistance, the context window affects infrastructure and product choices. Larger windows can improve long-form coherence and instruction-following, but they increase request payload sizes, memory use, and latency, and they may require more careful logging and data retention controls. When comparing hosting plans, consider limits on request size, CPU/RAM headroom, and whether your stack can support summarization or RAG to stay within context constraints.
Common Use Cases
- Multi-turn chatbots that must remember user preferences and prior decisions
- Long-form content generation where outlines, sources, and drafts must fit together
- Code assistants that need multiple files, error logs, and build output in one session
- RAG applications that inject retrieved documentation into the prompt
- Compliance or support workflows that summarize older conversation history to preserve key facts