Context Window

AI & ML

Definition

Context Window is the maximum amount of text, tokens, or other input a language model can consider at once when generating an output. It includes the prompt, system instructions, conversation history, and any retrieved documents. When the limit is reached, older content may be truncated or summarized, which can affect accuracy, continuity, and the model’s ability to follow long, detailed requirements.

How It Works

A context window is measured in tokens, not characters or words. Tokens are chunks of text (often parts of words), so the same paragraph can consume different token counts depending on language, formatting, and punctuation. The model processes everything inside the window as a single working set: instructions, user messages, tool outputs, and any pasted or retrieved content all compete for the same limited space.

When inputs exceed the window, an application must decide what to keep. Common strategies include dropping the oldest messages, summarizing earlier turns, or using retrieval-augmented generation (RAG) to fetch only the most relevant snippets from a database. The effective window also includes the model’s output: if you request a long answer, fewer tokens remain for input, which can force more aggressive truncation and reduce the model’s ability to reference earlier details.

Why It Matters for Web Hosting

If you host AI features such as chat support, content drafting, or code assistance, the context window affects infrastructure and product choices. Larger windows can improve long-form coherence and instruction-following, but they increase request payload sizes, memory use, and latency, and they may require more careful logging and data retention controls. When comparing hosting plans, consider limits on request size, CPU/RAM headroom, and whether your stack can support summarization or RAG to stay within context constraints.

Common Use Cases

Multi-turn chatbots that must remember user preferences and prior decisions
Long-form content generation where outlines, sources, and drafts must fit together
Code assistants that need multiple files, error logs, and build output in one session
RAG applications that inject retrieved documentation into the prompt
Compliance or support workflows that summarize older conversation history to preserve key facts

OpenAI Rebuilds Codex Into a Desktop Agent: Computer Use, 90+ Plugins, Memory and Multi-Day Automations

Cloudflare Agents Week 2026: The Complete Day-by-Day Breakdown of Every Launch

Context Window

How It Works

Why It Matters for Web Hosting

Common Use Cases

Related Terms

Embedding

Large Language Model (LLM)

Prompt Engineering

Retrieval-Augmented Generation (RAG)

Token (LLM)

Vector Database