MCP is the substrate. Exposing a decision store as an MCP server makes it queryable by any MCP-aware client without per-tool integration. The governance schema, precedence rules, and enforcement layer still need to live somewhere. MCP solves connectivity, not content.

Architectural Governance Across Heterogeneous AI Coding Agents

Q: Doesn't AGENTS.md already solve this?

AGENTS.md solves the static-instructions-in-a-portable-format problem and Mneme treats it as a first-class export target. It does not solve precedence between conflicting decisions, scope-aware injection, or pre-generation enforcement. A markdown file is read; it is not queried and it is not an enforcement point.

Q: What does NIST's work mean for engineering teams today?

Near term, limited operational impact — the initiative is in RFI / concept-paper / listening-session phase. Medium term, identity and authorization patterns harden first, followed by audit and behavioral controls. Regulated industries should track it; others can treat it as a multi-year tailwind.

Q: What if our org standardizes on one AI tool?

In practice, engineering orgs of meaningful size do not standardize on one tool. Inline completion, interactive coding, and async PR-opening agents tend to coexist because each is best-in-class at a different task. Procurement is one decision; what engineers actually use is another.

Q: Will the eventual standard make Mneme redundant?

No, in the same way OCI did not make Docker redundant and OAuth did not make Okta redundant. A standard defines the contract; vendors and open-source projects implement it. Mneme is designed to align with whatever cross-tool format emerges and to be the structured decision-store implementation that consumes and exports it.

Three years ago, the AI coding question for engineering leaders was "which assistant should we standardize on." That question is dead. The realistic answer in 2026 is "all of them, in different parts of the stack, often by the same engineer in the same week." A backend engineer pairs with Claude Code in the terminal, opens Cursor for a frontend touchup, lets Copilot autocomplete inside the JetBrains IDE, and reviews a PR opened by a Claude Agent SDK bot that the platform team wrote.

Each of these tools has its own idea of what "memory" means. CLAUDE.md for Claude Code. .cursor/rules/ for Cursor. .github/copilot-instructions.md for Copilot. Custom system prompts for whatever the platform team built. The instructions overlap. They drift. They contradict each other. And nobody owns the union.

This is the multi-tool reality that architectural governance has to survive. It will not survive in any single tool's native memory format.

The fragmentation that already happened

Engineering orgs do not adopt AI coding tools the way they adopt CI runners. There is no "we standardized on Jenkins" moment. Adoption is bottom-up, per-engineer, per-task, and increasingly per-stage of the SDLC. The result is a stack that looks like this in most teams running AI seriously:

Interactive coding

Claude Code, Cursor, Windsurf

Engineer-driven sessions in the editor or terminal. Memory lives in CLAUDE.md, .cursor/rules, .windsurfrules.

Inline completion

GitHub Copilot, Codeium, Tabnine

Always-on, sub-second completions. Limited or no project memory. Often configured per-IDE, not per-repo.

CI / async agents

Claude Code on the web, Copilot Workspace, custom SDK bots

Agents that open PRs, triage issues, run reviews. Often run with a different system prompt than what the engineer sees locally.

Domain-specific agents

Migration bots, codemod agents, security scanners

Built in-house on Claude Agent SDK, LangGraph, AutoGen. Each carries its own prompt, its own scope, its own opinion.

Each tool was designed to be excellent in isolation. None of them was designed to share a single canonical view of the architectural decisions the codebase is supposed to obey.

Fig 1 · Per-tool memory formats fan out; the codebase they govern does not.

Why each tool's native memory is a silo

The natural first instinct is to copy the same rules into every tool's preferred format. Write the architecture in CLAUDE.md, then mirror it into .cursor/rules/, then again into .github/copilot-instructions.md. Some teams script this with a shared markdown file and a generator. It looks clean for a sprint. Then it breaks for the same reasons every duplicated source of truth breaks.

Why per-tool memory fails as governance

Format mismatch is not just syntactic

Claude Code reads CLAUDE.md as a single context block. Cursor splits rules by glob and applies them only when matching files are touched. Copilot truncates aggressively. Two of those three will silently drop a rule that the third enforces. The same English sentence behaves differently in each tool.

Drift compounds per tool

An engineer updates CLAUDE.md after a refactor. Nobody updates the Cursor rule file. The web agent that opens PRs is still running with last quarter's system prompt. Three tools, three different snapshots of "what the architecture is" — and reviewers cannot tell which one any given diff was generated against.

Precedence is per-tool, not per-decision

When two rules conflict, Cursor resolves it by file glob, Claude Code by ordering in CLAUDE.md, Copilot by whatever the model attended to. None of them know that ADR-014 supersedes ADR-007 for the payments service specifically. Precedence is an architectural fact, not a tool feature.

No shared enforcement seam

Even if every tool reads the rules correctly, compliance is still advisory inside each one. There is no shared point at which generated code from any agent — interactive, async, in-CI — has to pass the same governance check before being accepted. The seams between tools are where violations land.

The seam problem

Heterogeneous tooling does not just multiply the per-tool failure modes. It creates a new class of failure that exists only at the boundaries between tools.

An engineer prototypes a service in Cursor with a relaxed rule about external API calls because they were experimenting. They push. A Claude Agent SDK bot picks up the branch in CI and refactors it, generating code against the stricter system prompt the platform team wrote. The bot's diff and the engineer's diff disagree on the architectural rule. Reviewers see two patches that look reasonable in isolation and cannot tell which agent was operating under which assumption.

This is not a hypothetical. Every team running both interactive AI in the editor and async AI in CI has seen versions of it. The patches are individually sensible. The collision is structural.

The seam between agents is where governance has to live. Inside each tool, there is no leverage — every model can be ignored, every text block can be misread. Between them, there is a natural enforcement point: the moment generated code is written to disk or proposed as a diff. That moment is the same regardless of which agent produced it.

How other categories solved this

The pattern of "many vendor-specific tools, one shared substrate underneath" is not new. Every prior infrastructure category that started fragmented has resolved through a recognizable sequence: vendor proliferation, then a community-led specification, then broad adoption that left vendors free to differentiate above the line.

Container runtimes. Docker dominated, but rkt, LXC, and others fragmented the ecosystem. The Linux Foundation's Open Container Initiative standardized image and runtime formats, and every major runtime today implements them. Docker did not lose; it became one OCI-compliant implementation among several.
Tracing and observability. Vendor APMs (DataDog, New Relic, Dynatrace) all ran their own instrumentation libraries. OpenTelemetry, a CNCF project, gave the industry a single instrumentation standard. Vendors compete on the backend; the wire format is shared.
IDE language tooling. Every IDE shipped its own language integration. Microsoft's Language Server Protocol defined a common interface so one language server could power VS Code, JetBrains, Vim, Emacs, and more. The fragmentation collapsed onto a shared protocol.
Identity. Per-vendor SSO gave way to OAuth 2.0 and OpenID Connect. Today no enterprise considers a tool that does not speak them.

The phases are consistent across categories. Tools proliferate. Format incompatibility creates real operational pain. A community-led specification forms — usually under a foundation. Vendors implement the spec. Differentiation moves up the stack.

Fig 2 · Standardization phases for prior infrastructure categories, with AI coding agents mapped onto the same axis.

If the AI coding category follows the same arc — and the early signals say it is — the practical question for engineering leaders is not whether a shared format will arrive, but what to do during the years before it lands. The answer is the same one teams used during every prior cycle: build above the eventual standard, not inside any one vendor's format.

Where the standards landscape stands today

Two community-led efforts and one government-led one are currently the most credible foundations for a future cross-tool agent governance standard. None are finalized. All are worth tracking.

NIST's AI Agent Standards Initiative. The Center for AI Standards and Innovation (CAISI) at NIST announced the AI Agent Standards Initiative in February 2026, with the stated goal of helping AI agents "interoperate smoothly across the digital ecosystem." A request for information on securing AI agent systems closed for public comment on March 9, 2026, and the NCCoE concept paper on AI agent identity and authorization proposes adapting existing identity standards (OAuth 2.0, OIDC) to non-human agent identities. The current scope is identity, authorization, and security — not output-policy enforcement directly — but it establishes the regulatory frame inside which governance protocols will eventually be evaluated.

The Model Context Protocol. MCP is an open, JSON-RPC-based protocol for exposing context, tools, and resources to AI clients. It does not specify a governance format, but it is increasingly the substrate over which a governance store can be made queryable to any compliant agent. A decision corpus exposed as an MCP server is consumable by every MCP-aware client without per-tool integration.

AGENTS.md. The AGENTS.md format — adopted across Codex, Cursor, Aider, Factory, Gemini CLI, Zed, and others, and stewarded by the Agentic AI Foundation under the Linux Foundation — is the closest thing to a shared per-repo instruction format that already works across vendors. OpenAI's Codex documentation treats it as the canonical instruction file. As a markdown convention, AGENTS.md cannot resolve precedence between conflicting decisions or enforce anything at the hook layer, but it is a credible baseline for the static-context portion of the problem and a likely component of any eventual full standard.

Mneme HQ tracks all three. Our standards landscape page covers how the project's design aligns with the direction these efforts are taking, and where we plan to engage.

What governance has to look like to survive heterogeneity

If the goal is that the same architectural decision is enforced whether the code came from Claude Code, Cursor, Copilot, or a custom SDK agent, then the governance layer cannot be inside any of those tools. It has to be a separate artifact that each tool defers to.

Per-tool memory

Tool-coupled, duplicated, advisory

Each agent has its own rule file in its own format. Compliance depends on the agent reading correctly. Drift between tools is silent. The seams are unguarded.

Shared governance memory

Tool-agnostic, single source, enforced

Decisions live in one structured store. Every agent — interactive, async, third-party, in-house — queries it before generating, and a hook checks output against it before code lands.

Concretely, an enforcement layer that survives heterogeneous agents has four properties:

Tool-agnostic representation. Decisions are stored in a structured format that is not coupled to any one assistant's prompt convention. Markdown is an export, not the source of truth.
One canonical store, many readers. Claude Code, Cursor, Copilot, and custom agents all read from the same file. Updating an architectural decision once is sufficient — there is no fan-out duplication to keep in sync.
Pre-generation injection. The relevant decisions for the current task are surfaced into whatever agent is running, in a format that agent can consume. The decisions are scoped, not dumped wholesale.
Post-generation enforcement at the seam. Generated diffs — from any agent — are checked against the same governance store before they are accepted. The enforcement point is the file write, the commit, or the PR, not the model.

This is the layering that makes heterogeneity safe. Each agent can keep its own strengths. The architecture is enforced by infrastructure that does not care which agent emitted the code.

Lock-in is the second cost

The seam problem is the operational cost of running multiple AI coding agents against one codebase: drift, format mismatch, no shared enforcement. There is a second cost that accrues more quietly and tends to surface only when a team tries to swap a tool. It is vendor lock-in, and for any codebase expected to live more than a year or two it is the more expensive of the two.

A team that builds its architectural memory inside one tool's native format — CLAUDE.md, .cursor/rules, .github/copilot-instructions.md, a vendor's proprietary memory product — is structurally betting that the chosen tool will still be the right tool for the lifetime of the codebase. The AI coding category does not behave that way. GitHub Copilot was the default in 2022. Cursor took the interactive-coding lead through 2024. Claude Code's terminal- and web-native model has been reshaping the picture through 2025 and 2026. Windsurf, Codex, and a steady drip of new agents continue to enter. The leading tool has changed roughly every eighteen months, and nothing in the market suggests the cycle is slowing.

Codebases outlive tooling fashion. A service written in 2023 is still in production in 2026, governed by decisions that were correct then and may need refinement now. If those decisions live inside a tool the team no longer uses, the cost of switching is not just retraining engineers. It is rewriting the architectural memory itself, in whatever format the new tool prefers, hoping nothing is lost in the translation. The codebase's architectural truth becomes hostage to whichever vendor was dominant the year it was first written.

The same agnostic layer that solves the seam problem also solves the lock-in problem. A team running Claude Code for backend, Cursor for frontend, and a custom SDK agent in CI can — if the memory layer lives outside any of them — switch any one of those tools tomorrow without touching the architectural truth. The tools become interchangeable; the governance does not.

This is the second reason the governance layer has to be tool-independent. The first reason is operational. This one is structural. Owning the architectural truth, in a format the team controls, is the only durable position when the tools underneath are themselves changing every cycle.

The category framing

The conversation about "the best AI coding tool" is the wrong conversation for any team large enough to have an architecture worth defending. There is no best tool — there is a portfolio, and the portfolio is going to grow. The question that matters is whether your governance layer is coupled to any one of them.

If it is, every new agent the team adopts becomes a new place for architectural decisions to drift. If it isn't, new agents are net additive: more capability, same enforcement, no extra coordination cost.

That is the infrastructure problem Mneme HQ is built around. A single decision store, queryable by any agent, enforced at the seam where every agent eventually has to write to disk.

FAQ

Is this only a problem for large engineering orgs?

No, but the pain compounds with team size. A two-engineer team using both Cursor and Claude Code can manually keep their rule files aligned. By the time a team has multiple repositories, several engineers, and any async agent (a CI bot, a PR reviewer agent, a migration script), the cost of maintaining duplicated per-tool memory exceeds the cost of running a shared store.

Doesn't AGENTS.md already solve this?

AGENTS.md solves the “static instructions in a portable format” problem. It is a meaningful step forward, and Mneme treats it as a first-class export target. What it does not solve is precedence between conflicting decisions, scope-aware injection (only the relevant decisions for this task), or enforcement before generation. A markdown file is read; it is not queried, and it is not an enforcement point. See agents.md for the spec.

How does MCP fit in?

MCP is the substrate, not the standard for governance content. Exposing a decision store as an MCP server makes it queryable by any MCP-aware client (Claude Code, Cursor's MCP support, custom agents) without per-tool integration. MCP solves the connectivity half of the problem; the governance schema, precedence rules, and enforcement layer still need to live somewhere. See the MCP specification.

What does NIST's work mean for engineering teams today?

In the near term, very little operational impact — the initiative is in the RFI / concept-paper / listening-session phase, not the published-standard phase. In the medium term, expect agent identity and authorization patterns to be the first to harden, followed by audit and behavioral controls. Teams in regulated industries (healthcare, financial services, public sector) should track the work; everyone else can treat it as a multi-year tailwind for the “governance is infrastructure” thesis. The initiative landing page is at nist.gov/caisi/ai-agent-standards-initiative.

What if our org standardizes on one AI tool?

In practice, no engineering org of meaningful size has standardized on one tool. Inline completion (Copilot), interactive coding (Cursor or Claude Code), and async PR-opening agents (Claude Code on the web, Copilot Workspace, custom SDK bots) tend to live alongside each other because each is best-in-class at a different task. Even teams that try to standardize discover that procurement is one decision and what engineers actually use is another.

Will the eventual standard make Mneme redundant?

No, in the same way that OCI did not make Docker redundant and OAuth did not make Okta redundant. A standard defines the wire format and the contract; vendors and open-source projects implement it. Mneme's design intent is to align with whatever cross-tool format emerges (likely some superset of AGENTS.md, MCP, and a NIST-influenced audit envelope) and to be the structured-decision-store implementation that consumes and exports it.

How does an agnostic layer protect against tool churn?

The leading AI coding tool has changed roughly every eighteen months — Copilot dominant in 2022, Cursor through 2024, Claude Code reshaping things in 2025–26, with Windsurf and Codex behind. A team whose architectural memory lives inside one tool's native format pays a rewrite cost every time it migrates. A team whose memory lives in an agnostic layer can swap the tool underneath without touching the governance — same decisions, same enforcement, different agent. Tool independence is operational hygiene and structural insurance at the same time.

The fragmentation that already happened

Why each tool's native memory is a silo

The seam problem

How other categories solved this

Where the standards landscape stands today

What governance has to look like to survive heterogeneity

Lock-in is the second cost

The category framing

FAQ

Related reading