Why AI Architectural Governance Needs Precedence Semantics

Most of the conversation about "AI coding governance" in 2026 is still about the wrong layer. Prompt rules. CLAUDE.md. .cursor/rules. RAG over an ADR folder. A reviewer agent on PRs. Policy docs in a wiki. Every one of these answers the question "how do we tell the model what we want?" — and not one of them answers the prior question: "when two of the things we want disagree, which one wins?"

That second question is not a corner case. It is the central question of any governance system that has to operate at the scale of a real codebase, with real exceptions, written by real teams over real years. And it is the question almost every tool in the category is currently dodging.

The missing layer has a name. It is precedence semantics. It is what makes governance deterministic, reviewable, and durable. Without it, "AI coding governance" is just a polite name for whichever rule the model happened to attend to this morning.

The collision nobody owns

Consider a perfectly ordinary situation. A team's data layer is governed by ADR-014: "All persistent data access goes through the repository pattern. No service may call an ORM session directly." The decision is sound. It has been the law of the codebase for a year. Every reviewer can recite it.

Eight months later, the payments team writes ADR-022: "The payments service may invoke the Stripe SDK directly inside an idempotency-key boundary. The repository abstraction does not compose with Stripe's at-most-once semantics, and the ledger requires raw call ordering." Also sound. Also reviewed. Also accepted.

These two decisions overlap. Specifically, they overlap on the question "how does the payments service write a charge to durable storage?" Both apply. They disagree.

Now four things happen at once:

An engineer in Cursor opens the payments service and writes a charge handler. Cursor reads .cursor/rules, sees ADR-014 because it was indexed first, and proposes a repository-based implementation.
An engineer in Claude Code, with a different CLAUDE.md, sees ADR-022 highlighted and writes a direct-SDK implementation. Both diffs are reviewed in isolation; both get approved.
An async PR bot built on the Agent SDK refactors an adjacent file and rewrites the call site to obey ADR-014, because that is what its system prompt encodes.
Six months later, a new engineer reads the codebase, sees both patterns in production, and asks the obvious question: "which one are we supposed to follow?" Nobody can answer without re-reading both ADRs and re-litigating the conflict from scratch.

None of those four people are misbehaving. The codebase has two correct decisions and no mechanism to say which one wins where. That is the precedence problem, and it is exactly the problem that no prompt rule, no retrieval system, and no review process will ever resolve, because none of them are designed to resolve anything — they are designed to retrieve.

Fig 1 · Without precedence semantics, the same overlapping decisions resolve differently across every layer. With them, the answer is one value, computed before the model is asked.

Why current systems can't resolve it

It is tempting to think that the conflict in the example above is a content problem — that if the team had written ADR-022 more carefully, or pasted it higher in CLAUDE.md, the right answer would emerge. It is not a content problem. It is a structural one. Every governance substrate currently in widespread use is fundamentally a retrieval substrate, and retrieval has no opinion on conflict.

How today's substrates handle the same overlap

Prompt rules resolve by attention

CLAUDE.md, .cursor/rules, .github/copilot-instructions.md all hand the model a block of text. When two rules disagree, the model picks one based on whichever it attended to more strongly under this temperature, this context length, this surrounding code. Reorder the file and the answer changes. That is not governance — it is a coin flip with extra steps.

RAG resolves by retrieval score

Indexing the ADR folder and retrieving the top k chunks per query feels rigorous. It is not. When two ADRs both score highly for "payments writes a charge," whichever the embedder happens to rank higher gets injected. Re-embed the corpus and the resolution can flip silently. The architecture is now a function of the vector index.

PR review resolves by whoever was looking

If the reviewer who happens to be assigned remembers ADR-022, the diff lands. If a different reviewer is assigned next time, the next diff lands the other way. The codebase ends up with both patterns and no record of which decision governed which file. The conflict is resolved by social process, and social process does not scale across async agents.

Policy docs resolve nothing at all

The wiki page is updated, sometimes. The Notion entry is written, occasionally. Neither of them is on the path between a model and a diff. They are referenced when a human looks them up — which is exactly when the resolution is already most expensive.

The common failure underneath all four is the same: none of them have an opinion on which rule applies when several do. They surface rules. They do not resolve between rules. The category has confused "the model has read the constraint" with "the constraint has been applied," and those two things are not the same thing.

What precedence semantics actually is

Precedence semantics is the small body of rules a governance system uses to answer one question, every time, the same way: "given the current task, the current file, the current scope, and the full set of architectural decisions, which decision actually applies here, and which loses?"

The answer has to be deterministic. Not "the model probably picks the right one." Not "the reviewer usually catches it." Deterministic, in the engineering sense: same inputs in, same answer out, every time, regardless of which agent or model or temperature is on the other end.

It also has to be reviewable. The resolution has to be explainable as a chain of named facts — this scope was more specific, that decision supersedes the earlier one, this status retired the override — not as an opaque embedding score. A reviewer has to be able to read the resolution and either agree with it or, when it is wrong, point at the rule that produced the wrong answer and propose a fix.

Governance is not memory retrieval. Governance is deterministic conflict resolution over architectural constraints. Retrieval is one input to that resolution — not a substitute for it.

The five axes of resolution

A useful precedence engine resolves over a small, finite set of axes. Five of them carry almost all the weight in real codebases. None of them is novel on its own — what is missing in the AI-coding-governance category is the recognition that all five have to compose, and that the order in which they compose has to be explicit.

Resolution axes · in priority order

Axis	What it answers	Example
Status	Is the decision in force at all?	A `deprecated` ADR loses to any `accepted` ADR that touches the same scope, even if the deprecated one was more specific.
Supersedes	Does this decision explicitly retire an older one?	ADR-031 carries `supersedes: ADR-014`. Wherever they overlap, ADR-014 is treated as if it were deprecated.
Scope specificity	Whose scope is narrower?	ADR-022 (`services/payments/`) beats ADR-014 (`services/`) inside its narrower scope, by the same logic that makes a per-route override beat a global config.
Priority	When scopes are equal, who is authoritative?	A security-class ADR (`priority: critical`) wins over an ergonomics-class ADR (`priority: normal`) at the same scope. Priorities are declared, not inferred.
Temporal	If everything else ties, the newer decision wins.	Two equal-priority, equal-scope, neither-supersedes-the-other decisions resolve by acceptance date. This axis is the tiebreaker, not the primary driver.

Two things matter about that table more than the contents of any one row. First: the axes are evaluated in a declared order, not improvised per query. Second: each axis is a fact carried by the decision itself, not a heuristic about the decision. status, supersedes, scope, priority, accepted_at — these are properties an ADR declares, not inferences a model has to make. Once declared, resolution is a finite computation, not a guess.

This is why precedence semantics is the missing layer. Every term in the table is already familiar to anyone who has written ADRs for more than a year. What is new is the claim that resolving over them is a system responsibility, not a reviewer's job to do in their head.

Governance is deterministic conflict resolution

Once precedence semantics is named, the whole category reframes.

The retrieval framing

Surface the right rules to the model

Governance is about getting the relevant constraints into context. The model takes it from there. Conflict resolution is whatever the model produces, hopefully on most runs.

The precedence framing

Resolve which rule applies before the model runs

Governance is about computing, deterministically, the single constraint that governs this scope. The model receives that constraint, not the conflict. The same inputs produce the same answer in every agent.

The retrieval framing makes the model the conflict-resolution engine. That is exactly the wrong place to put it. Models are excellent at code synthesis under a constraint and unreliable at choosing between constraints. The precedence framing keeps the model out of the resolution and lets it do what it is good at — generating code that conforms to a constraint that has already been chosen.

Said differently: the architectural truth of a codebase is not allowed to depend on which agent ran the query. If it does, the codebase does not have an architecture — it has a sample.

Governance is a compiler problem

The frame that follows from all of this is that an AI-era governance system is shaped like a compiler.

The input is the same kind of thing teams already write: ADRs, design documents, exceptions, and the relationships between them. The output is a single, queryable, scope-aware representation that every agent — interactive, async, in CI — can ask the same question of and get the same answer. Between the two is a pipeline: normalize, resolve, compile, enforce, trace.

Fig 2 · The governance compiler. Stage 02 is where precedence semantics lives — the layer the current category mostly does not have.

Each stage in that pipeline already has a recognizable analog in software engineering. Normalization is what a parser does. Resolution is what a type checker or a constraint solver does. Compilation is what an intermediate representation is for. Enforcement is what a pre-commit hook or a CI gate is for. Traceability is what build provenance is for. What is new in 2026 is that all five together describe a layer above the agent — not a feature inside any single one.

This is what makes the governance-as-compiler framing useful, and not just a metaphor. Compilers are deterministic by construction. Their outputs are reproducible. Their decisions are explainable. Their failures are localizable. Every property AI coding governance currently lacks is a property a compiler-shaped governance layer would have by default.

What this changes for engineering leaders

For an engineering leader looking at the category in 2026, the practical implication is that the question to ask of any "AI governance" pitch is no longer "does it read our rules?" It is: "what does it do when two of our rules disagree?"

If the answer is "the model figures it out," the system is a retrieval layer with a governance label on it. If the answer involves embeddings, ranking scores, or "we tune the prompt," the system is doing statistics, not governance. If the answer involves a declared resolution order over status, supersedes, scope, priority, and time, the system is doing the actual job.

That distinction matters more than the surface-level tool category. A team can run Claude Code, Cursor, Copilot, Windsurf, and three in-house SDK agents on the same codebase — as most engineering orgs already do — and still have a coherent architecture, but only if the layer underneath has a deterministic answer to "which decision applies here." Otherwise the codebase ends up as a sample of whichever agent ran last.

The category is moving up the stack. "AI coding tools" is becoming a portfolio decision. "How those tools resolve architectural conflict" is becoming the infrastructure decision underneath it. Precedence semantics is the name for that infrastructure layer.

The engineering teams that win the next cycle will not be the ones that picked the best assistant. They will be the ones whose architecture survived having a portfolio of assistants — because the layer that decided what the rules were did not live inside any of them.

FAQ

Isn't precedence just rule ordering?

No. Rule ordering is a single axis — the rule that comes last wins. Precedence semantics is multi-dimensional: scope specificity, supersedes relationships, explicit priority, decision status, and temporal resolution all interact. A more recent decision can be overridden by an older one with narrower scope. A higher-priority decision can be retired by status. Ordering cannot express any of that.

Why can't an LLM just resolve the conflict at generation time?

An LLM resolving a conflict is, by definition, non-deterministic. The same conflict with the same inputs can resolve differently on different runs, in different agents, at different temperatures. Architectural decisions need a single answer that every agent and every reviewer can reproduce. That requires the resolution to happen before the model sees the constraints, not inside the model's reasoning.

How is this different from policy-as-code?

Policy-as-code (OPA, Conftest, similar) is a downstream evaluator: given a policy and an artifact, return allow or deny. Precedence semantics sits upstream — it answers which policy actually applies when several overlap, before evaluation runs. The two compose. Precedence resolution decides what the rule is; policy-as-code decides whether the artifact obeys it.

What does this require from how we write ADRs?

Less than it sounds. ADRs continue to be written as decisions with rationale. What changes is that they declare three additional facts in structured form: their scope (paths, services, layers), their relationship to prior decisions (supersedes, refines, exception-to), and their status (proposed, accepted, deprecated). Those three facts are what a precedence engine resolves over.

Does this only matter for large codebases?

It matters as soon as you have two decisions whose scopes overlap. That happens earlier than most teams expect — usually around the time the first cross-service exception is granted. Small teams feel it as “we keep relitigating this in code review.” Large teams feel it as silent drift between services.

The collision nobody owns

Why current systems can't resolve it

What precedence semantics actually is

The five axes of resolution

Governance is deterministic conflict resolution

Governance is a compiler problem

What this changes for engineering leaders

FAQ

Related reading