Three years ago, the AI coding question for engineering leaders was "which assistant should we standardize on." That question is dead. The realistic answer in 2026 is "all of them, in different parts of the stack, often by the same engineer in the same week." A backend engineer pairs with Claude Code in the terminal, opens Cursor for a frontend touchup, lets Copilot autocomplete inside the JetBrains IDE, and reviews a PR opened by a Claude Agent SDK bot that the platform team wrote.
Each of these tools has its own idea of what "memory" means. CLAUDE.md for Claude Code. .cursor/rules/ for Cursor. .github/copilot-instructions.md for Copilot. Custom system prompts for whatever the platform team built. The instructions overlap. They drift. They contradict each other. And nobody owns the union.
This is the multi-tool reality that architectural governance has to survive. It will not survive in any single tool's native memory format.
The fragmentation that already happened
Engineering orgs do not adopt AI coding tools the way they adopt CI runners. There is no "we standardized on Jenkins" moment. Adoption is bottom-up, per-engineer, per-task, and increasingly per-stage of the SDLC. The result is a stack that looks like this in most teams running AI seriously:
Each tool was designed to be excellent in isolation. None of them was designed to share a single canonical view of the architectural decisions the codebase is supposed to obey.
Why each tool's native memory is a silo
The natural first instinct is to copy the same rules into every tool's preferred format. Write the architecture in CLAUDE.md, then mirror it into .cursor/rules/, then again into .github/copilot-instructions.md. Some teams script this with a shared markdown file and a generator. It looks clean for a sprint. Then it breaks for the same reasons every duplicated source of truth breaks.
The seam problem
Heterogeneous tooling does not just multiply the per-tool failure modes. It creates a new class of failure that exists only at the boundaries between tools.
An engineer prototypes a service in Cursor with a relaxed rule about external API calls because they were experimenting. They push. A Claude Agent SDK bot picks up the branch in CI and refactors it, generating code against the stricter system prompt the platform team wrote. The bot's diff and the engineer's diff disagree on the architectural rule. Reviewers see two patches that look reasonable in isolation and cannot tell which agent was operating under which assumption.
This is not a hypothetical. Every team running both interactive AI in the editor and async AI in CI has seen versions of it. The patches are individually sensible. The collision is structural.
The seam between agents is where governance has to live. Inside each tool, there is no leverage — every model can be ignored, every text block can be misread. Between them, there is a natural enforcement point: the moment generated code is written to disk or proposed as a diff. That moment is the same regardless of which agent produced it.
How other categories solved this
The pattern of "many vendor-specific tools, one shared substrate underneath" is not new. Every prior infrastructure category that started fragmented has resolved through a recognizable sequence: vendor proliferation, then a community-led specification, then broad adoption that left vendors free to differentiate above the line.
- Container runtimes. Docker dominated, but rkt, LXC, and others fragmented the ecosystem. The Linux Foundation's Open Container Initiative standardized image and runtime formats, and every major runtime today implements them. Docker did not lose; it became one OCI-compliant implementation among several.
- Tracing and observability. Vendor APMs (DataDog, New Relic, Dynatrace) all ran their own instrumentation libraries. OpenTelemetry, a CNCF project, gave the industry a single instrumentation standard. Vendors compete on the backend; the wire format is shared.
- IDE language tooling. Every IDE shipped its own language integration. Microsoft's Language Server Protocol defined a common interface so one language server could power VS Code, JetBrains, Vim, Emacs, and more. The fragmentation collapsed onto a shared protocol.
- Identity. Per-vendor SSO gave way to OAuth 2.0 and OpenID Connect. Today no enterprise considers a tool that does not speak them.
The phases are consistent across categories. Tools proliferate. Format incompatibility creates real operational pain. A community-led specification forms — usually under a foundation. Vendors implement the spec. Differentiation moves up the stack.
If the AI coding category follows the same arc — and the early signals say it is — the practical question for engineering leaders is not whether a shared format will arrive, but what to do during the years before it lands. The answer is the same one teams used during every prior cycle: build above the eventual standard, not inside any one vendor's format.
Where the standards landscape stands today
Two community-led efforts and one government-led one are currently the most credible foundations for a future cross-tool agent governance standard. None are finalized. All are worth tracking.
NIST's AI Agent Standards Initiative. The Center for AI Standards and Innovation (CAISI) at NIST announced the AI Agent Standards Initiative in February 2026, with the stated goal of helping AI agents "interoperate smoothly across the digital ecosystem." A request for information on securing AI agent systems closed for public comment on March 9, 2026, and the NCCoE concept paper on AI agent identity and authorization proposes adapting existing identity standards (OAuth 2.0, OIDC) to non-human agent identities. The current scope is identity, authorization, and security — not output-policy enforcement directly — but it establishes the regulatory frame inside which governance protocols will eventually be evaluated.
The Model Context Protocol. MCP is an open, JSON-RPC-based protocol for exposing context, tools, and resources to AI clients. It does not specify a governance format, but it is increasingly the substrate over which a governance store can be made queryable to any compliant agent. A decision corpus exposed as an MCP server is consumable by every MCP-aware client without per-tool integration.
AGENTS.md. The AGENTS.md format — adopted across Codex, Cursor, Aider, Factory, Gemini CLI, Zed, and others, and stewarded by the Agentic AI Foundation under the Linux Foundation — is the closest thing to a shared per-repo instruction format that already works across vendors. OpenAI's Codex documentation treats it as the canonical instruction file. As a markdown convention, AGENTS.md cannot resolve precedence between conflicting decisions or enforce anything at the hook layer, but it is a credible baseline for the static-context portion of the problem and a likely component of any eventual full standard.
Mneme HQ tracks all three. Our standards landscape page covers how the project's design aligns with the direction these efforts are taking, and where we plan to engage.
What governance has to look like to survive heterogeneity
If the goal is that the same architectural decision is enforced whether the code came from Claude Code, Cursor, Copilot, or a custom SDK agent, then the governance layer cannot be inside any of those tools. It has to be a separate artifact that each tool defers to.
Concretely, an enforcement layer that survives heterogeneous agents has four properties:
- Tool-agnostic representation. Decisions are stored in a structured format that is not coupled to any one assistant's prompt convention. Markdown is an export, not the source of truth.
- One canonical store, many readers. Claude Code, Cursor, Copilot, and custom agents all read from the same file. Updating an architectural decision once is sufficient — there is no fan-out duplication to keep in sync.
- Pre-generation injection. The relevant decisions for the current task are surfaced into whatever agent is running, in a format that agent can consume. The decisions are scoped, not dumped wholesale.
- Post-generation enforcement at the seam. Generated diffs — from any agent — are checked against the same governance store before they are accepted. The enforcement point is the file write, the commit, or the PR, not the model.
This is the layering that makes heterogeneity safe. Each agent can keep its own strengths. The architecture is enforced by infrastructure that does not care which agent emitted the code.
Lock-in is the second cost
The seam problem is the operational cost of running multiple AI coding agents against one codebase: drift, format mismatch, no shared enforcement. There is a second cost that accrues more quietly and tends to surface only when a team tries to swap a tool. It is vendor lock-in, and for any codebase expected to live more than a year or two it is the more expensive of the two.
A team that builds its architectural memory inside one tool's native format — CLAUDE.md, .cursor/rules, .github/copilot-instructions.md, a vendor's proprietary memory product — is structurally betting that the chosen tool will still be the right tool for the lifetime of the codebase. The AI coding category does not behave that way. GitHub Copilot was the default in 2022. Cursor took the interactive-coding lead through 2024. Claude Code's terminal- and web-native model has been reshaping the picture through 2025 and 2026. Windsurf, Codex, and a steady drip of new agents continue to enter. The leading tool has changed roughly every eighteen months, and nothing in the market suggests the cycle is slowing.
Codebases outlive tooling fashion. A service written in 2023 is still in production in 2026, governed by decisions that were correct then and may need refinement now. If those decisions live inside a tool the team no longer uses, the cost of switching is not just retraining engineers. It is rewriting the architectural memory itself, in whatever format the new tool prefers, hoping nothing is lost in the translation. The codebase's architectural truth becomes hostage to whichever vendor was dominant the year it was first written.
The same agnostic layer that solves the seam problem also solves the lock-in problem. A team running Claude Code for backend, Cursor for frontend, and a custom SDK agent in CI can — if the memory layer lives outside any of them — switch any one of those tools tomorrow without touching the architectural truth. The tools become interchangeable; the governance does not.
This is the second reason the governance layer has to be tool-independent. The first reason is operational. This one is structural. Owning the architectural truth, in a format the team controls, is the only durable position when the tools underneath are themselves changing every cycle.
The category framing
The conversation about "the best AI coding tool" is the wrong conversation for any team large enough to have an architecture worth defending. There is no best tool — there is a portfolio, and the portfolio is going to grow. The question that matters is whether your governance layer is coupled to any one of them.
If it is, every new agent the team adopts becomes a new place for architectural decisions to drift. If it isn't, new agents are net additive: more capability, same enforcement, no extra coordination cost.
That is the infrastructure problem Mneme HQ is built around. A single decision store, queryable by any agent, enforced at the seam where every agent eventually has to write to disk.