Context Rot and AI Hallucinations: Why Your AI Gets Worse the Longer You Use It
There's a moment most people who work with AI will recognise. You're deep into a long session — building something complex, iterating through ideas, going back and forth — and gradually, almost imperceptibly, the quality of the responses starts to slip. The model starts repeating suggestions you've already dismissed. It forgets decisions you made an hour ago. Then it confidently tells you something that's simply wrong.
This isn't a glitch. It has a name: context rot. And understanding it is one of the most useful things you can learn if you work with AI seriously.
What Is Context Rot?
Every large language model (LLM) works with what's called a context window — a finite block of information the model can "see" at any one time. Every message you send, every response the model gives, every file you paste in — it all accumulates in that window.
Context rot is what happens as that window fills up. Performance degrades not because the model has changed, but because the ratio of useful signal to accumulated noise keeps dropping. The model attends less to your original instructions and more to the clutter around them — old error messages, discarded approaches, outdated versions of the same code or document.
Crucially, this isn't a failure of capacity. Research by Chroma tested 18 leading frontier models and found that every single one showed performance degradation as context length increased. Not some. Not most. All of them. And the degradation starts well before the model hits its token limit. A model with a 200,000-token context window can begin producing unreliable output at around 130,000 tokens. Bigger windows delay the problem — they don't solve it.
The Early Days: When Context Windows Were Small
Those of us who started using AI tools early experienced this in a far more brutal form.
When the first capable models became widely available, context windows were tiny by today's standards — a few thousand tokens. Anything complex would fall apart quickly. I experienced this directly when working on detailed process documents. The moment the conversation reached a certain length, the quality of reasoning collapsed. The model would lose track of decisions made earlier, contradict itself, or simply drift away from the original brief.
The workaround was painstaking: break every large task into small, self-contained topics, and start a fresh chat for each one. You'd have to re-educate the model at the beginning of every new session — restating the context, the goals, the constraints — just to get it back to the starting point. It was effective, but it was exhausting, and it made complex, multi-part work enormously slow.
This wasn't a weakness in the approach. It was the only approach that worked. And it taught something important: a focused session almost always outperforms a long one.
What Hallucinations Have to Do With It
AI hallucinations — where a model confidently produces something factually wrong or entirely fabricated — are a related but distinct phenomenon. But context rot makes hallucinations worse.
As context grows and the signal-to-noise ratio drops, the model becomes less reliable at distinguishing what it actually knows from what it's interpolating or generating to fill a gap. The dangerous part isn't the obvious nonsense. It's the plausible-sounding wrong answer — the one that's coherent, detailed, and delivered with exactly the same confidence as a correct response.
This is particularly problematic in longer sessions involving complex reasoning, document creation, or multi-step problem solving. The model isn't malfunctioning when it does this. It's doing exactly what it's designed to do — generating the most probable next token — but with a context that's increasingly noisy and contradictory. The output reflects the quality of the input.
The Signal-to-Noise Problem
Here's a useful way to think about it. Imagine every piece of information in a session as a token you're spending. The original task definition, your requirements, the key decisions — these are high-value tokens. But every side conversation, every failed attempt, every discarded direction also costs tokens. And once spent, they don't disappear. They sit in the context window competing for the model's attention.
A debugging rabbit hole that takes an hour to resolve might consume 40% of your usable context — and by the time you've found the fix, the model has lost track of what the original task even was.
The pattern repeats in agentic and coding contexts too. An AI coding agent reading files, following dead ends, and backtracking through search results accumulates noise at speed. Of 20,000 tokens consumed in a moderately complex coding task, only around 500 might be the genuinely relevant signal.
What Good Practice Looks Like Now
Working with AI well means managing context deliberately. Some principles that hold up across tools and models:
Start fresh often. A new session isn't admitting defeat — it's removing noise. If the model is repeating itself, forgetting decisions, or producing inconsistent output, start a new conversation. Bring forward only what's essential.
Scope sessions tightly. One topic, one task, one logical unit of work per session. The moment a session becomes a general workspace for everything, its quality degrades.
Externalise what matters. Decisions, requirements, and important context belong in files — not buried in conversation history. In tools like Claude Code, a well-maintained CLAUDE.md file means a fresh session can start informed. Sessions end. Files persist.
Summarise before you continue. When moving between sessions on the same topic, write a brief summary of where you got to — what was decided, what the next step is — and use that as the opening context for the new session.
Treat context like a budget. Every token has a cost. Use them on the things that matter.
Why This Matters Beyond the Technicalities
Understanding context rot changes how you think about working with AI. It reframes "the AI got confused" as a predictable, manageable property of the technology — not an arbitrary failure. That shift in understanding leads to better workflows, less frustration, and significantly better output.
The models have improved enormously. Context windows are vastly larger than they were even two years ago, and active research continues into reducing the impact of context rot at the architectural level. But the fundamental dynamic hasn't changed: focused, well-scoped sessions produce better results than long, sprawling ones.
The lesson from the early days still applies. It's just easier to forget it now that the windows are bigger.
Posted by Envision8 · envision8.com