"Drift" gets used loosely in AI coding discussions — sometimes to mean model degradation, sometimes to mean prompt forgetting, sometimes to mean code quality regression. None of those are what this page is about. AI agent drift names a specific, structural failure mode: the gap between the architectural decisions a team has made and the architectural choices an agent makes in any given session, when those decisions were never made queryable.
It is not a model failure. It is not an agent failure. It is a missing layer.
What AI agent drift actually means
An agent starts a session. Its context is whatever the user supplies plus whatever the agent retrieves from the workspace. It has no inherited memory of decisions made in previous sessions, by previous agents, by other developers on the team. If the codebase contains the repository pattern in one service and a direct-DAO pattern in another, the agent has no way to know which one the team has decided to standardize on — unless that decision is encoded somewhere the agent will query, in a form the agent can use.
So the agent infers. It looks at what's around, picks a pattern, and proposes code consistent with what it sees. If it sees both patterns, it picks one — sometimes the right one, sometimes the wrong one. Across hundreds of sessions, that inference produces a probability distribution of pattern choices. The expected value of that distribution is not the team's decision. It is the agent's best guess given local samples.
AI agent drift is the divergence between what the team has decided and what the agent — left to its own inference — actually produces. It is not a moral failure of the agent. The agent is doing exactly what it is designed to do: write code that fits the context it sees. The structural problem is that "the context it sees" is not the same thing as "the decisions the team has made."
No single agent action is drift. Drift is the statistical signature of many agent actions taken without a shared source of truth. It is what you observe in aggregate, not in any individual PR.
Why agent drift is structural, not behavioral
It is tempting to think of agent drift as something to be fixed by better-trained agents, longer context windows, or more thorough CLAUDE.md files. None of those address the structural cause.
- Better-trained agents can encode general best practices but cannot know your team's architectural decisions. A model that scores higher on every coding benchmark still has no awareness of your team's ADR-0017 on retry budgets.
- Longer context windows let agents read more — but reading is not the same as querying authoritative decisions. The agent still has to infer which of the patterns it read is the canonical one. A 1M-token window does not include the team's decision register unless that register exists and is structured.
- Better CLAUDE.md files document decisions to humans and to the agent's reading layer, but they are advisory: a wall of prose competing with the rest of the context window for the model's attention. They are read; they are not enforced.
The structural cause of agent drift is that architectural decisions are not currently a queryable substrate. There is no schema, no API, no scope-aware retrieval, no precedence model. The agent has no way to ask "what decision applies to this file?" and receive a deterministic answer. The decisions exist — in heads, in wikis, in old ADRs — but they are not in a form that closes the inference gap. As long as the gap is open, drift will accumulate at the rate the team generates code.
Drift across agents, sessions, and providers
The drift problem is amplified by three independent axes of agent multiplicity. Each axis multiplies the surface area of the problem.
| Axis | What changes | Drift impact |
|---|---|---|
| Sessions | Same developer, new context window | Decisions made in chat A are invisible to chat B |
| Agents | Claude Code, Cursor, Copilot side-by-side | Each has its own context surface and rule format |
| Providers | Different model families, training cutoffs | Different priors, different defaults, different code style |
| Developers | Each developer's local AI workflow | No shared substrate — each session is private |
| Time | Context resets, conversation compaction | Even within a single session, mid-session memory loss |
The takeaway: drift is not a bug in any one of these axes. It is the predictable outcome when none of them share state. The cure is shared state — externally compiled, externally enforced, available to all agents on all axes.
The common misread: confusing drift with hallucination
Engineers new to the problem sometimes describe agent drift as "the agent hallucinating an architecture." That framing is wrong in a way that matters. The agent is not hallucinating. It is faithfully producing code that fits the context it can see. The architecture it produces is a plausible inference from a partial sample.
Hallucination is a model failing to be grounded in its inputs. Drift is a model being perfectly grounded — in the wrong substrate. The substrate is the codebase plus the prompt; the decisions live elsewhere, often nowhere queryable. Treating drift as hallucination leads teams to invest in groundedness improvements (better RAG, more documentation in prompts) that don't address the missing decision layer.
The fix for drift is not making the agent better. It is giving the agent something to query — a compiled, deterministic, scope-aware decision corpus that any agent, in any session, with any context window, can consult before generating code.
Drift as a measurable signal
Drift is not just a narrative — it is observable. A governance system that intercepts agent tool use at the pre-generation hook can record, for every proposal, which decisions applied, which were respected, and which were violated. Aggregated, these records form a drift telemetry surface:
- Drift rate per decision — which decisions are violated most often? This is a signal that the decision is unclear, too narrow, or has a missing scope.
- Drift rate per agent — does one agent runtime systematically miss certain constraint types? Useful for tuning each agent's integration.
- Drift rate per region of the codebase — are violations clustering in legacy code, in new services, in specific paths? Helps target where the governance scope needs to be tightened.
- Drift over time — is the rate trending up or down after a new ADR is compiled? The first quantitative measure of whether a decision is taking effect.
None of this telemetry is available when governance is process-based. You cannot graph the rate at which reviewers caught a violation that should have been caught by a constraint, because the constraint doesn't exist in a measurable form. Encoding decisions for governance is the precondition for measuring drift. Measuring drift is the precondition for closing the loop on it.
What stops agent drift
Three properties of the governance layer, taken together, are what stop drift:
- Externalized decisions — the decisions live outside any single agent's context, in a compiled corpus that every agent reads from. No agent has to remember; every agent has to look.
- Pre-generation enforcement — the decisions are consulted before the agent writes, not after. Drift is prevented at the point of intent, not corrected in review.
- Deterministic answers — the same query against the same corpus returns the same constraint, every time, for every agent. The governance answer is not a guess and is not session-dependent.
This is the precise role of governance infrastructure: to be the substrate that makes "what does the team decide here?" a queryable question. Once the question is queryable, drift is no longer the default behavior — it is the exception, surfaced and measurable.
Related concepts
- Architectural drift — the codebase-side effect that AI agent drift produces. One looks at the agent; the other looks at the code.
- Multi-agent continuity — the property that decisions persist across agents. Continuity is what stops drift at the boundary between sessions and providers.
- Governance before generation — the enforcement posture that addresses drift at the point of intent rather than at the point of review.