Research 9 min read

Constraint Decay Is Why Coding Agents Need Architectural Governance

Coding agents are getting better at producing code. That is no longer the interesting part. The more important question is whether the code they produce still belongs inside the system they are modifying. A new arXiv paper gives the failure mode a name — constraint decay — and quantifies it: capable agent configurations lose around 30 points in assertion pass rates as structural requirements accumulate.

By Theo Valmis·May 2026

A new paper, Constraint Decay: The Fragility of LLM Agents in Backend Code Generation, gives a useful name to a failure mode many engineering teams are already starting to feel. The authors study how LLM agents perform when backend generation tasks require not only functional correctness, but also adherence to structural constraints such as architectural patterns, databases, object-relational mappings, and framework conventions. Their finding is direct: agents perform well under loose specifications, but degrade as structural requirements accumulate. The paper evaluates 80 greenfield generation tasks and 20 feature-implementation tasks across eight web frameworks, using both behavioral tests and static verifiers. Capable configurations lose around 30 points on average in assertion pass rates from baseline to fully specified tasks.

That phenomenon is what the authors call constraint decay.

It is an important phrase because it separates two different problems.

The first problem is obvious: the generated code does not work.

The second problem is more dangerous: the generated code works, but violates the structure of the system.

It bypasses the intended data layer.
It ignores the ORM convention.
It places logic in the wrong boundary.
It follows the endpoint contract but not the architectural contract.
It satisfies a test while weakening the codebase.

That distinction matters.

Functional correctness tells you whether the output behaves as expected. Structural correctness tells you whether the output preserves the system it was supposed to extend.

The problem is not that the agent cannot write code. The problem is that it cannot reliably preserve the rules that make the code belong to this system.

Constraint decay becomes architectural drift

Constraint decay is the local failure mode.

Architectural drift is the accumulated consequence.

A single agent-generated change that ignores an ORM pattern may look harmless. A single shortcut around a service boundary may even pass review if the behavior is correct. But when agents are producing more code, more frequently, across more surfaces, those small structural violations compound.

Over time, the system starts to diverge from its intended architecture.

The issue is not that the team forgot its architecture. The issue is that the architecture was never made executable at the point where agents were generating code.

Failure mode	What broke	Where it shows up
Functional failure	The code does not work	Tests, runtime errors
Constraint decay	The code works but ignores structural rules	Per-PR — if anyone looks for it
Architectural drift	Decay accumulated across the codebase	Months later, in rework and incidents

This is the core governance gap in AI-assisted development.

Teams already have decisions. They have ADRs, conventions, code review norms, database boundaries, framework preferences, and hard-won lessons about what not to do. But most of those constraints remain written for humans. They live in documents, comments, onboarding conversations, and senior engineers’ heads.

Coding agents do not reliably preserve that kind of context.

They need executable boundaries.

Tests are necessary, but not sufficient

One of the most useful parts of the paper is its evaluation design. The authors use both end-to-end behavioral tests and static verifiers. That separation is critical. Behavioral tests evaluate whether the generated application works. Static verifiers evaluate whether the code satisfies structural requirements.

That maps directly to the infrastructure gap emerging around coding agents.

Tests validate behavior. Governance validates intent.

A test can tell you whether an endpoint returns the right response. It may not tell you whether the implementation used the approved repository pattern. It may not detect that a dependency crossed the wrong layer. It may not know that the team has an ADR prohibiting a certain storage path, framework shortcut, or migration pattern.

In traditional development, senior engineers often caught these issues during review. That worked when code volume was human-paced.

AI changes the economics.

If agents increase the volume of generated code, and structural validation remains concentrated at PR review, then the review queue becomes the governance layer by accident. Senior engineers become constraint recovery systems.

That is not scalable.

Backend systems expose the problem faster

The paper’s backend focus is especially useful because backend systems make structural decay harder to hide.

Frontend demos can often look plausible while hiding poor structure. Backend systems have more explicit architectural commitments: data access patterns, schema constraints, framework conventions, service boundaries, API contracts, and runtime behavior.

The authors find significant variation across frameworks. Agents do better in minimal, explicit environments such as Flask and worse in more convention-heavy environments such as FastAPI and Django. They also identify data-layer defects, including incorrect query composition and ORM runtime violations, as leading causes.

That is the part engineering leaders should pay attention to.

The more a system depends on conventions, implicit architecture, and layered data access, the more fragile agent-generated code becomes without governance.

This is not a prompt problem alone.

You can put more instructions in the prompt. You can ask the agent to be careful. You can paste the architecture into context. Those things may help, but they do not create a reliable enforcement layer.

Instructions are not invariants.

The next layer is architectural governance

The AI coding stack is still heavily focused on generation.

Better models.
Better IDEs.
Better autocomplete.
Better agent loops.
Better test generation.
Better PR summaries.

All of that matters.

But as generation improves, the bottleneck shifts.

The question becomes: who preserves the architecture?

That is where architectural governance becomes infrastructure. Not governance in the abstract enterprise-policy sense. Governance as executable technical constraints inside the development workflow.

A governance layer should be able to answer questions like:

Does this change violate an architectural decision?
Did the agent introduce a dependency across a forbidden boundary?
Did it bypass the approved data access pattern?
Did it modify a surface that should be out of scope?
Can we trace the violation back to the decision it contradicts?
Can this check run before the PR review queue?

That is the shift from documentation to verification contracts.

Architecture cannot remain passive context when agents are actively generating code. It has to become enforceable.

Where Mneme fits

This is the category Mneme is being built for: architectural governance for AI-assisted development.

Mneme is not trying to be another semantic memory layer or generic RAG system. The goal is to make architectural decisions enforceable across the places where AI-assisted development happens.

That means turning decisions, ADRs, and project constraints into checks that can run before generation, during agent workflows, and in CI.

A coding assistant can generate.

A test suite can validate behavior.

A PR reviewer can still apply judgment.

But the architectural invariants should not depend entirely on late human review. They should be available as executable guardrails.

That is why constraint decay is such a useful research framing. It gives language to the exact failure mode Mneme is designed to address.

Constraint decay is what happens when agents lose structural fidelity.
Architectural drift is what happens when that decay compounds across a codebase.
Architectural governance is the missing control layer.

Generation needs boundaries

The answer is not to slow coding agents down.

The answer is to give them better boundaries.

As AI-assisted development becomes normal, teams will not only ask whether agents can produce more code. They will ask whether agents can preserve the architecture, respect the decisions already made, and avoid shifting structural risk into review queues.

That is the next maturity step.

The future of AI-assisted development will not be won by generation alone. It will be won by generation plus governance.

Frequently asked questions

What is constraint decay in LLM coding agents?+

Constraint decay is the failure mode in which an LLM coding agent starts with a valid goal but loses adherence to structural requirements as the task becomes more constrained. The code may still pass behavioral tests while violating architectural patterns, ORM rules, framework conventions, or data-layer boundaries.

How is constraint decay different from a normal coding bug?+

A normal coding bug means the code does not work. Constraint decay means the code works but ignores the structure of the system it was supposed to extend — wrong layer, wrong access pattern, wrong dependency. The behavior is correct; the architecture is not.

How is constraint decay related to architectural drift?+

Constraint decay is the local, per-change failure mode. Architectural drift is what happens when those local failures compound across many agent-generated changes over time. Decay is the mechanism; drift is the accumulated system-level consequence.

Why are tests not enough to catch constraint decay?+

Tests validate behavior. They confirm that an endpoint returns the right response. They do not in general confirm that the implementation used the approved repository pattern, respected a service boundary, or complied with an ADR. Catching constraint decay requires static verification of intent, not just behavior.

What is architectural governance for AI-assisted development?+

Architectural governance turns architectural decisions, ADRs, and project constraints into executable checks that run before generation, during agent workflows, and in CI — not policy documents that depend on senior reviewers spotting violations after the fact.