What teams mean by “RAG for coding memory”
The pattern is now familiar: index a repository’s ADRs, design docs, code snippets, prior commits, and engineering wiki pages into a vector store. At each coding agent turn, retrieve the top-k most relevant chunks and inject them into the model’s context. The goal is persistent project memory — the agent “remembers” what your team has decided, even across new sessions.
It is a genuine upgrade over a single CLAUDE.md or a flat rules file. The vector store scales beyond a context window. The retrieval is task-relevant, not blanket. And for a wide class of coding work — understanding unfamiliar code, surfacing examples, finding related implementations — it does what it claims.
- Scales to large knowledge bases without saturating the context window
- Surfaces task-relevant context instead of injecting every rule into every session
- Works with the infrastructure teams already have: pgvector, Pinecone, Chroma, LanceDB
- Improves average-case code quality by giving the model concrete examples and prior art
- Captures tacit knowledge that lives in design docs and Slack threads
Where RAG-as-coding-memory breaks is at the seam between two different memory problems. Contextual memory is what the agent should know to do good work. Decision memory is what the agent must obey every time, even when the topic isn’t the focus of the current task.
Approximate. Recall-quality scales with embedding quality. Best for examples, prior implementations, design discussions, narrative documentation. The model decides how to apply what is surfaced.
Exact. Recall must be deterministic regardless of phrasing. Best for architectural decisions, scope rules, supersessions. The system — not the model — resolves which decision applies and how.
The framing that resolves the debate: RAG is good memory for things the agent can use. It is poor memory for things the agent must obey. Architectural decisions are the second category.
Where they differ as memory layers
| Dimension | RAG coding memory | Mneme HQ |
|---|---|---|
| Recall guarantee | Probabilistic top-k | Deterministic scope lookup |
| Scope handling | Implicit in embeddings | Explicit per decision (file, dir, repo) |
| Precedence resolution | Model interprets context order | Precedence engine resolves |
| Cross-session continuity | Re-retrieved per session, embedding-dependent | Same verdict every session, every agent |
| Supersession history | Old and new versions may both surface | Superseded decisions explicitly tracked |
| Override semantics | None — model silently overrides | Override is a decision record with rationale |
| Audit trail | Which chunks retrieved, not which applied | Every enforcement event is traceable |
| Multi-agent recall | Different models retrieve and apply differently | Same enforcement contract for every agent |
Three coding-memory failure modes
The reliability gap shows up in concrete ways once a team scales an AI coding agent past a handful of sessions. These are the patterns that drive teams from pure-RAG memory to a typed decision layer.
1. Silent forgetting
An architectural decision is indexed. It exists in the vector store. The agent edits a file the decision applies to — but the current task description doesn’t embed close to the decision’s text. The decision is not in the top-k. The agent generates code that violates it, and nothing notices.
This is the most common failure: not that the rule is missing, but that retrieval ranking demoted it. The fix is not better embeddings. It is recall that doesn’t depend on phrasing similarity.
2. Embedding decay
A decision is rephrased over time as ADRs are edited, summarized, or rewritten. Each rephrasing shifts its embedding. Tasks that used to retrieve it cleanly now miss. Worse: an older, superseded version with stronger keyword overlap re-enters the top-k while the current version doesn’t. The agent obeys the wrong version of a decision the team thought it had updated.
3. Scope collision
Two decisions overlap. One is repo-wide; the other applies only to a specific service. Both retrieve. The model picks based on context order, recency in the window, or token similarity — not authority. The narrower, more specific decision loses to the broader one because it embedded slightly worse. Vector search has no native concept of scope precedence.
Each of these is structural. They are not solvable by upgrading the embedding model, the chunking strategy, or the reranker. They are the cost of using probabilistic retrieval as a substitute for typed lookup.
Why typed decision memory works for code
Code is structured. Files have paths. Modules have boundaries. Architectural decisions have explicit scope — this applies to services/auth/**, that applies to the whole repo. Vector search flattens all of this into similarity space and asks the model to reconstruct it from context.
Mneme treats decisions as typed records, not text. Each decision has explicit scope (file glob, directory, repo), a precedence position (override, supersedes), and a machine-evaluable predicate. When an AI coding agent attempts an Edit or Write, Mneme looks up which decisions apply by scope match, resolves any precedence conflicts deterministically, and either lets the edit through or blocks it with the specific decision that fired.
The retrieval is exact. Same file, same edit, same verdict — every time, every agent, every session. There is no embedding drift because there is no embedding. There is no top-k miss because there is no ranking step. The decision either applies (by scope) or it doesn’t.
This is not a richer RAG. It is a different memory primitive: structured lookup against a typed corpus, designed for the property RAG cannot offer — deterministic recall of decisions that must hold every time.
Using RAG and Mneme together
The right architecture treats RAG and Mneme as complementary memory layers rather than competing approaches. Each handles the memory problem it is suited to.
- Index narrative context with RAG. Design docs, prior implementations, related code, engineering discussions. These are the things the agent should be able to find when relevant — and they tolerate approximate recall.
- Encode decisions in Mneme. Architectural decisions, scope rules, supersessions, language and dependency policies. These need to apply every time, regardless of whether the current task description happens to mention them.
- Use RAG output to improve generation quality. Surface examples, document excerpts, prior art into the model’s context window at task time.
- Use Mneme to enforce decision recall at edit time. Hook into Edit and Write operations. When the model produces an edit that violates an architectural decision, Mneme blocks it before it reaches the codebase.
- Track overrides as first-class decision records. When a team intentionally diverges from a decision, the override is itself recorded with rationale and scope. RAG can surface the override discussion; Mneme governs the new boundary.
In this architecture, RAG is the memory of what we have written. Mneme is the memory of what we have decided. A coding agent that uses both has stronger context and harder compliance guarantees than either provides alone — and the failure modes of RAG-as-decision-memory disappear, because RAG is no longer being asked to do the work it was never built for.