What teams mean by “RAG for coding memory”

The pattern is now familiar: index a repository’s ADRs, design docs, code snippets, prior commits, and engineering wiki pages into a vector store. At each coding agent turn, retrieve the top-k most relevant chunks and inject them into the model’s context. The goal is persistent project memory — the agent “remembers” what your team has decided, even across new sessions.

It is a genuine upgrade over a single CLAUDE.md or a flat rules file. The vector store scales beyond a context window. The retrieval is task-relevant, not blanket. And for a wide class of coding work — understanding unfamiliar code, surfacing examples, finding related implementations — it does what it claims.

  • Scales to large knowledge bases without saturating the context window
  • Surfaces task-relevant context instead of injecting every rule into every session
  • Works with the infrastructure teams already have: pgvector, Pinecone, Chroma, LanceDB
  • Improves average-case code quality by giving the model concrete examples and prior art
  • Captures tacit knowledge that lives in design docs and Slack threads

Where RAG-as-coding-memory breaks is at the seam between two different memory problems. Contextual memory is what the agent should know to do good work. Decision memory is what the agent must obey every time, even when the topic isn’t the focus of the current task.

Contextual memory
RAG’s home turf

Approximate. Recall-quality scales with embedding quality. Best for examples, prior implementations, design discussions, narrative documentation. The model decides how to apply what is surfaced.

Decision memory
Where typed corpus wins

Exact. Recall must be deterministic regardless of phrasing. Best for architectural decisions, scope rules, supersessions. The system — not the model — resolves which decision applies and how.

The framing that resolves the debate: RAG is good memory for things the agent can use. It is poor memory for things the agent must obey. Architectural decisions are the second category.

Where they differ as memory layers

Dimension RAG coding memory Mneme HQ
Recall guarantee Probabilistic top-k Deterministic scope lookup
Scope handling Implicit in embeddings Explicit per decision (file, dir, repo)
Precedence resolution Model interprets context order Precedence engine resolves
Cross-session continuity Re-retrieved per session, embedding-dependent Same verdict every session, every agent
Supersession history Old and new versions may both surface Superseded decisions explicitly tracked
Override semantics None — model silently overrides Override is a decision record with rationale
Audit trail Which chunks retrieved, not which applied Every enforcement event is traceable
Multi-agent recall Different models retrieve and apply differently Same enforcement contract for every agent

Three coding-memory failure modes

The reliability gap shows up in concrete ways once a team scales an AI coding agent past a handful of sessions. These are the patterns that drive teams from pure-RAG memory to a typed decision layer.

1. Silent forgetting

An architectural decision is indexed. It exists in the vector store. The agent edits a file the decision applies to — but the current task description doesn’t embed close to the decision’s text. The decision is not in the top-k. The agent generates code that violates it, and nothing notices.

This is the most common failure: not that the rule is missing, but that retrieval ranking demoted it. The fix is not better embeddings. It is recall that doesn’t depend on phrasing similarity.

2. Embedding decay

A decision is rephrased over time as ADRs are edited, summarized, or rewritten. Each rephrasing shifts its embedding. Tasks that used to retrieve it cleanly now miss. Worse: an older, superseded version with stronger keyword overlap re-enters the top-k while the current version doesn’t. The agent obeys the wrong version of a decision the team thought it had updated.

3. Scope collision

Two decisions overlap. One is repo-wide; the other applies only to a specific service. Both retrieve. The model picks based on context order, recency in the window, or token similarity — not authority. The narrower, more specific decision loses to the broader one because it embedded slightly worse. Vector search has no native concept of scope precedence.

Each of these is structural. They are not solvable by upgrading the embedding model, the chunking strategy, or the reranker. They are the cost of using probabilistic retrieval as a substitute for typed lookup.

Why typed decision memory works for code

Code is structured. Files have paths. Modules have boundaries. Architectural decisions have explicit scope — this applies to services/auth/**, that applies to the whole repo. Vector search flattens all of this into similarity space and asks the model to reconstruct it from context.

Mneme treats decisions as typed records, not text. Each decision has explicit scope (file glob, directory, repo), a precedence position (override, supersedes), and a machine-evaluable predicate. When an AI coding agent attempts an Edit or Write, Mneme looks up which decisions apply by scope match, resolves any precedence conflicts deterministically, and either lets the edit through or blocks it with the specific decision that fired.

The retrieval is exact. Same file, same edit, same verdict — every time, every agent, every session. There is no embedding drift because there is no embedding. There is no top-k miss because there is no ranking step. The decision either applies (by scope) or it doesn’t.

This is not a richer RAG. It is a different memory primitive: structured lookup against a typed corpus, designed for the property RAG cannot offer — deterministic recall of decisions that must hold every time.

Using RAG and Mneme together

The right architecture treats RAG and Mneme as complementary memory layers rather than competing approaches. Each handles the memory problem it is suited to.

  1. Index narrative context with RAG. Design docs, prior implementations, related code, engineering discussions. These are the things the agent should be able to find when relevant — and they tolerate approximate recall.
  2. Encode decisions in Mneme. Architectural decisions, scope rules, supersessions, language and dependency policies. These need to apply every time, regardless of whether the current task description happens to mention them.
  3. Use RAG output to improve generation quality. Surface examples, document excerpts, prior art into the model’s context window at task time.
  4. Use Mneme to enforce decision recall at edit time. Hook into Edit and Write operations. When the model produces an edit that violates an architectural decision, Mneme blocks it before it reaches the codebase.
  5. Track overrides as first-class decision records. When a team intentionally diverges from a decision, the override is itself recorded with rationale and scope. RAG can surface the override discussion; Mneme governs the new boundary.

In this architecture, RAG is the memory of what we have written. Mneme is the memory of what we have decided. A coding agent that uses both has stronger context and harder compliance guarantees than either provides alone — and the failure modes of RAG-as-decision-memory disappear, because RAG is no longer being asked to do the work it was never built for.

Frequently asked questions

Is RAG a good memory layer for AI coding agents?
RAG works well as contextual memory: surfacing relevant code snippets, documentation, prior commits, and historical examples to improve generation quality. It is weaker as decision memory: when an architectural decision must apply every time a certain file or pattern is touched, top-k retrieval can miss the rule, rank it below noisier neighbors, or surface a superseded version. Contextual memory is approximate by nature; decision memory has to be exact.
Why does RAG miss architectural decisions for coding agents?
Three structural reasons. First, embedding similarity is keyword-shaped: a decision phrased differently from the current task is not retrieved. Second, scope is not native to vector search: a decision that applies only to one directory may surface for unrelated files. Third, precedence is invisible to retrieval: when two decisions overlap, the model picks based on context order, not authority. Each of these is solvable in indexing, but the cumulative reliability is still probabilistic.
Can I plug my existing ADR RAG into Mneme HQ?
Yes. Mneme imports ADRs and architectural decision records into a typed decision corpus with scope rules and precedence semantics. Your existing ADR collection remains the human-readable source. Mneme adds the machine-evaluable layer on top: each decision becomes a constraint with explicit scope, supersession history, and an enforcement contract. You keep RAG for narrative retrieval; Mneme handles the per-edit recall.
Should we use both RAG and Mneme for coding memory?
Yes, at different layers. RAG is the right tool for contextual recall: documentation, design discussions, prior implementations. Mneme is the right tool for decision recall: which architectural decisions apply to this file, in what order, with what scope. Most teams settle on a layered architecture where RAG informs generation and Mneme enforces decisions. The two operate on different memory problems.