Agent-first IDEs change the unit of work
The classic IDE assumed the unit of work was a line, a function, a file. Tooling helped the developer write each one faster. Agent-first IDEs — Google Antigravity, Cursor’s agent modes, Claude Code in autonomous mode — assume the unit of work is a task. The developer specifies an outcome; the agent does the multi-step work to achieve it.
That is a categorical shift in what the human is doing inside the editor. Less typing, more delegating.
Delegated tasks need shared constraints
When a human writes the code, the architecture lives in their head. They remember the ADR, they know which dependency was deprecated, they recognize the pattern as a copy of one the team has explicitly retired. The architecture is enforced by human judgment, applied per line.
When an agent does the work, the architecture has to live somewhere the agent can access it — and somewhere the system can enforce it without depending on the human watching each step. Delegation needs an external constraint substrate.
The agent does not need to be told the rule. The agent needs to be unable to act outside it.
Natural language instructions decay across long-running workflows
Rules embedded in system prompts are probabilistic. They work for the first turn, often work for the next few, and decay as the context grows. Long-running agent workflows in agent-first IDEs — multi-hour tasks, multi-file edits, multi-tool sessions — are exactly the regime where natural-language instructions stop being reliable.
The architecture cannot be a paragraph the agent might or might not remember. It has to be a contract the system enforces.
Review catches symptoms after generation
PR review catches what slipped through. It does not prevent the agent from making the choice. At human pace, that asymmetry was tolerable — reviewers had time to read carefully and push back. At agent pace, review queues fill faster than they can be processed, and reviewers shift from quality gates to triage queues.
Catching architectural violations downstream of execution is incident response, not infrastructure.
Invariants need to be encoded, retrieved, and enforced
The right shape is three composing pieces:
- Encoded — the rule exists as a machine-evaluable artifact, not a paragraph. ADRs compiled into constraint records.
- Retrieved — the relevant constraints are surfaced into the agent’s task context deterministically. Same task, same set of applicable rules.
- Enforced — the check produces a binary verdict before the change can be committed. Same input, same verdict, every time.
That is what makes an invariant an invariant: it holds. Not most of the time. Not when the agent remembers. Always.
Mneme turns ADRs into executable governance context
Mneme is built around exactly this three-step pattern. The ADR corpus is compiled into a deterministic governance layer. Retrieval is rule-based and reproducible. Enforcement runs at hook, pre-commit, and CI — same compiled constraints across whichever agent or IDE produced the change.
For agent-first IDEs specifically, this matters because the agent is no longer a tool inside the editor — it is the active participant. The architecture has to be a thing the agent operates inside, not a thing the developer holds in their head.
Agent-first IDEs increase autonomous execution. Architectural governance keeps that execution aligned.
What this looks like in practice
Concretely, with invariants in place:
- The agent receives the task, plus the architectural decisions that apply to the surfaces it’s touching.
- If it tries to introduce a forbidden dependency, the check returns FAIL with provenance back to the ADR that forbids it.
- If it crosses a service boundary, the same enforcement triggers, regardless of which agent or IDE generated the change.
- The reviewer sees the verdict alongside the diff — not just “what did the agent do” but “was the agent allowed to do it.”
That is the difference between a delegated task with rails and a delegated task without them.