Flagship 02 6 min read RUNNABLE

Architectural drift prevention — the AI SDLC entropy demo.

AI accelerates entropy. Each agent-produced change is locally reasonable. The system as a whole still drifts because each change silently relaxes an invariant the team had already encoded. Without a governance layer, drift compounds faster than review can absorb it. This page walks the failure mode end-to-end and shows what changes when the first divergence is blocked upstream.

The scenario

A small platform team maintains a single-binary Python service. The architectural truth is documented:

Three agents touch the codebase over the course of a week. Each one produces a reasonable-looking PR. Review is busy. By Friday the architecture has silently forked.

Without a governance layer

Drift propagation
1
Monday · Agent A

Introduces Redis "for caching"

Asked to "speed up the user lookup endpoint." Adds redis-py, wires a cache layer, ships a 60-line PR. Locally reasonable. Silently violates ADR-001.

2
Monday · Reviewer

PR lands. ADR-001 not reread.

Reviewer focused on the cache invalidation logic. The decision corpus is a separate file no one opens during review. Approved in 4 minutes.

3
Wednesday · Agent B

Builds infrastructure around the new Redis dependency

Asked to "add session storage." Discovers Redis is now in the codebase and uses it. Adds connection pooling, retry logic, a health check. The divergence is now load-bearing.

4
Thursday · Agent C

Adds infra YAML for Redis

Asked to "make the staging deploy reproducible." Generates a Redis container, a service definition, a backup policy. None of this should exist per ADR-001. None of it is flagged.

5
Friday · Architect

Notices. Too late.

Architecture has silently forked. Rolling back means three PRs, a deploy revert, and a conversation about why the decision corpus exists. The team eats the cost.

This is the failure mode. No single agent did anything outrageous. Each PR was reviewed by a human. Each change was locally reasonable. The drift emerged from the composition of changes none of which individually triggered a review escalation.

With Mneme's governance layer

Upstream block + convergence
1
Monday · Agent A · pre-generation

Hook intercepts the proposed change

Before the diff lands, the Claude Code / Cursor hook scores the proposal against project_memory.json and surfaces ADR-001 into the model's context. The agent reroutes to extending the existing JSON cache instead.

trace · PreToolUse hook · matched ADR-001 · injected 3 constraints
2
Monday · CI · post-generation

mneme check fires WARN on residual signal

A leftover import redis survives in a draft commit. The CI gate emits a structured WARN with the decision id, so review sees the violation in the PR comments, not buried in a 200-line diff.

WARN [ADR-001] no external db — trigger: redis
3
Monday · Agent A · retry

Retry converges within constraints

The hook re-runs with the violated decisions explicit in the prompt. The agent regenerates against the JSON cache abstraction. The new diff passes.

PASS · storage_violation_count = 0 · retries = 1
4
Wednesday · Agent B

Session storage built on the correct primitive

Asked to "add session storage." The hook surfaces ADR-001, ADR-004. Output uses the Repository abstraction over JSON. Compliant by construction.

PASS · ADR-001, ADR-004 satisfied
5
Friday · Architect

No drift to clean up

The decision corpus stayed authoritative all week. Three PRs landed. None of them silently relaxed an invariant. The architect spends Friday on the next ADR, not on remediation.

Same agents. Same prompts. Same model. The only difference is that ADR-001 reaches the model before generation, and the verdict reaches CI after it. Drift gets blocked at the first divergence, not the third.

Why this is the AI SDLC entropy problem

Human-authored drift is bounded by human throughput. A team produces maybe ten meaningful PRs a day. An architect can read all of them. A reviewer can flag the one that smells wrong.

Agent-authored drift is not bounded by human throughput. The same architect now sees ten PRs per hour. Each one was generated by an agent that pattern-matched against training data, not against the project's decision corpus. Most of them are fine. The one that isn't is invisible because it looks like all the others.

This is why review-based governance starts to fail at AI velocity. The reviewer is doing the right thing — reading the diff — but the failure mode has moved upstream of the diff. The decisions that matter are now in the agent's context window, not in a comment thread. Governance has to be wherever the decisions are made.

Two paths through the same week

Without governance

Drift compounds
  • 3 PRs land that contradict ADR-001
  • Each one looks locally reasonable
  • Reviewer load: 12 PRs that week
  • Drift detected post-deploy
  • Remediation: revert + ADR rewrite + meeting

With Mneme

System converges
  • First divergence blocked pre-generation
  • Retry converges within 1 round
  • Downstream agents see the corrected codebase
  • CI gate enforces on every PR
  • Architect time freed for the next decision

Run the reproducible example

The repo ships a Python walkthrough that simulates the timeline above against the Mneme pipeline. Three sequential "agents" produce diff hunks. The same Mneme conflict detector and enforcer that drive the editor hook and the CI gate evaluate each step:

git clone https://github.com/TheoV823/mneme
cd mneme/examples/architectural-drift
python run.py

The script prints the without-governance timeline first (drift propagation), then the with-governance timeline (upstream block and retry convergence). Same decision corpus on disk for both runs. The script does not call any LLM; the architectural divergences are scripted so that the enforcement output is deterministic and reproducible.

What this proves. The runnable example does not claim to demonstrate sophisticated autonomous agents. It demonstrates that the governance layer stays coherent across multiple proposed changes. That is the proof surface that matters for this category, and it is small enough to be deterministic.

Where the governance lives

The enforcement output above is generated by the same three artifacts that drive every other Mneme integration:

The drift demo is what happens when those three artifacts are missing. The convergent timeline is what happens when they are wired up.