Rule files were the right first answer
Every AI coding tool eventually ships a rule file. Cursor has Rules. Claude Code reads CLAUDE.md. Windsurf, Copilot, and a dozen others have their own variants. The instinct behind all of them is correct: an agent begins each task with no memory of your conventions, so you give it a document that says “here is how we build things here.”
For a small repository with a handful of conventions, this works well. The agent reads the file, follows the patterns, and produces code that looks like yours. Rule files solve the cold-start problem — the gap between a generic model and a project-specific one. That is a genuine and useful job.
A rule file is not documentation. It is an always-on prompt prefix. Every token in it is prepended to every request, whether or not it is relevant to the task at hand.
What a rule file actually is
It helps to be precise about the mechanism. A rule file is not consulted when needed; it is loaded unconditionally. When the agent edits a CSS file, your database-migration conventions are still in the context window. When it writes a test, your API-versioning policy is still there. The model is asked to hold every rule you have ever written in working memory and apply only the ones that matter — for every single call.
That design has a predictable failure curve. It is fine at ten rules and degrades steadily toward a hundred. The degradation is not a bug in any particular tool. It is structural, and it shows up along three axes.
Three axes where static instructions stop scaling
1. Token budget
A rule file competes for the same context window as the code being edited, the retrieved files, the conversation history, and the agent’s own scratch work. Every convention you add is a permanent tax on every request. Teams respond by trimming the file — which means deleting governance to make room for work. The instruction surface and the working surface are zero-sum, and the rules lose.
2. Precedence
When two instructions conflict — “prefer composition” near the top, “match the surrounding file’s inheritance pattern” near a code sample — a static file has no deterministic way to resolve which wins. Precedence becomes a function of recency, position in the prompt, and the model’s mood. As the file grows, the number of latent conflicts grows faster, and the resolution gets less predictable.
3. Scope
A rule that should apply only to src/payments/ is, in a flat rule file, applied to the whole repository or to nothing. There is no clean way to say “this invariant holds here and not there.” You either over-apply the rule (and fight false positives) or under-specify it (and let the real violations through).
The failure mode is not that the agent forgets the rule. It is that the rule was present, diluted, out-ranked, or applied in the wrong place. More tokens do not fix any of those.
Retrieval memory: load the relevant subset at the moment it matters
The alternative is to stop treating architectural intent as a preamble and start treating it as a queryable store. Instead of pasting every convention into every prompt, you record decisions as structured records and retrieve only the ones relevant to the change in front of the agent.
When the agent is about to touch src/payments/charge.ts, the system surfaces the decisions that govern payments — the idempotency rule, the forbidden direct-charge path, the required audit log — and nothing about your CSS conventions. The relevant subset is small, sharp, and scoped. Token budget stops being zero-sum because you are not carrying the whole rulebook on every call. Precedence becomes explicit, because a retrieval system can rank and resolve. Scope becomes native, because records are keyed to the surfaces they apply to.
| Dimension | Static rule file | Retrieval memory |
|---|---|---|
| Loading | Whole file, every request | Relevant subset, on demand |
| Token cost | Grows with every rule | Bounded by the task |
| Precedence | Position and recency | Explicit, deterministic ranking |
| Scope | Whole repo or nothing | Keyed to files and surfaces |
| Enforcement | Advisory — model may ignore | Checkable before the change lands |
This is not RAG, and it is not a bigger context window
Two objections come up immediately. The first: “this is just RAG.” It is not. Retrieving documentation chunks by semantic similarity surfaces prose for the model to read; it does not produce a decision with a known scope, a precedence rank, and a verdict. Architectural retrieval returns constraints, not paragraphs. Retrieval is a transport mechanism, not a memory model.
The second: “longer context windows make this moot.” They do not. A million-token window lets you paste a bigger rule file; it does not make the rules out-rank a closer, more recent instruction, and it does not give a flat document scope. Bigger windows raise the ceiling on the static approach without changing its shape. CLAUDE.md stops scaling for reasons that more context does not address.
What changes in practice
The shift from rule files to retrieval memory changes the daily mechanics of AI-assisted work in a few concrete ways:
- Decisions get recorded once — as structured records with a scope, a rationale, and the constraint they imply — instead of re-pasted into every tool’s rule file.
- The agent carries less — only the subset relevant to the current change, leaving the window for the actual work.
- Conflicts resolve deterministically — precedence is a property of the store, not of where a line happens to sit in a prompt.
- Enforcement becomes possible — a retrieved constraint can be checked against the proposed change, which a pasted paragraph never can.
Rule files and retrieval are not rivals
This is not an argument to delete your CLAUDE.md or your Cursor Rules. They remain the right tool for a short, stable set of always-true conventions — the cold-start primer. The point is narrower: the instruction surface is the wrong place to put governance that has to scale, resolve conflicts, and be enforced. For a deeper comparison of where each fits, see Mneme vs Cursor Rules and Mneme vs CLAUDE.md.
Static instructions tell the model your rules. Retrieval memory brings the right rule to the right change — and lets you enforce the ones that must hold.