PRs were designed for a different era
Pull requests became the standard collaboration surface for modern software teams because they gave humans a place to inspect change before it merged. They answer human questions well:
- What changed?
- Why was it changed?
- Who reviewed it?
- What discussion happened around it?
But agentic development introduces a new class of questions PRs were never designed to answer:
- Which architectural decision allowed this change?
- Which invariant did the agent think it was following?
- What context was retrieved before generation?
- Which policy was checked before implementation?
- Was the change inside the intended scope?
- Did the agent override, ignore, or reinterpret an existing constraint?
Traditional PRs assume a human wrote the code, understands the context, and can explain the trade-offs. Agentic development breaks that assumption. A PR may now include code produced across multiple prompts, tools, files, sessions, and inferred requirements. The reviewer sees the diff, but not necessarily the decision path, policy context, or provenance chain behind it.
Human-readable PRs are necessary but insufficient
The answer is not to replace the PR description with JSON. Human-readable summaries still matter. Reviewers need narrative, trade-offs, and intent — the parts of a change that resist structured fields.
But narrative alone does not scale when autonomous systems are generating, editing, and submitting changes at agent velocity.
Human-readable PRs explain. Machine-readable PRs allow verification.
What a machine-readable PR would include
A machine-readable PR does not need to be complex. It exposes a small set of structured fields:
- Change intent — a one-line statement of purpose
- Scope boundary — allowed and forbidden paths
- Files and systems touched
- Related issue, ADR, requirement, or decision
- Architectural decisions retrieved before generation
- Invariants checked
- Policy verdicts
- Known exceptions
- Agent and tool provenance
- CI governance result
- Remediation constraints
Concretely, the difference is between two registers of the same change. A human reviewer might read:
This PR updates the authentication flow to support SSO.
A governance system needs to know:
This PR modifies auth/session boundaries, touches ADR-004, is allowed under policy AUTH-002, must not bypass token rotation, and passed the relevant invariant checks.
Both descriptions are correct. They are written for different readers.
PRs become verification surfaces
In traditional development, a PR is mostly a review document. In agentic development, the PR becomes a verification surface.
A verification surface is a place where intent, policy, provenance, and system behavior can be checked before change is accepted. CI already treats PRs as execution surfaces — tests run, linters execute, security scans trigger. The missing layer is architectural and governance verification.
We have machine-readable test results, security findings, and dependency alerts. We do not yet have a standard machine-readable contract for architectural intent.
Why comments and checklists are not enough
Most teams try to solve this with PR templates. Link the ticket, describe the change, add screenshots, confirm tests, note risks.
Templates are weak governance:
- They are often optional.
- They are written in prose.
- They cannot reliably be interpreted by CI.
- They do not prove which constraints were retrieved before generation.
- They do not create a durable contract between architectural decisions and code change.
A checklist asks the author to remember the rule. A machine-readable governance contract lets the system verify whether the rule was applied.
The provenance problem
As agents become more autonomous, the question is not only what changed? It is where did the change come from?
- Was the change generated from an issue?
- A prompt?
- A stale architectural decision?
- A local convention?
- A copied pattern from another part of the codebase?
- A tool-generated migration?
Without provenance, reviewers can inspect the output but cannot easily inspect the reasoning trail or policy context that produced it. Machine-readable PR metadata gives governance systems a place to attach decision provenance, policy provenance, and enforcement provenance — so the chain from architectural intent to merged code is durable, not just remembered.
The risk of agentic PR spam
As agents get better at producing plausible code and plausible PR descriptions, review queues risk filling with changes that look complete but are structurally under-specified.
The problem will not be that agents cannot write summaries. They will write very convincing summaries.
The problem is that summaries are not contracts.
What good looks like
An ideal future workflow runs in seven steps:
- Agent starts work.
- Governance layer retrieves relevant architectural decisions.
- Agent generates code within those constraints.
- PR is opened with a human-readable summary.
- PR also includes machine-readable governance metadata.
- CI verifies the metadata against repo-native policy.
- Reviewers see not only tests and diffs, but architectural verdicts, provenance, and scope boundaries.
A minimal example of the metadata block that step 5 would produce:
governance:
change_intent: "Add SSO support to authentication flow"
scope:
allowed_paths:
- "app/auth/**"
- "app/session/**"
forbidden_paths:
- "app/billing/**"
decisions:
- id: "ADR-004"
title: "Session tokens must rotate on privilege changes"
status: "accepted"
invariants:
- "Do not bypass token rotation"
- "Do not introduce direct provider dependency outside auth boundary"
provenance:
source_issue: "ENG-142"
generated_by: "coding-agent"
retrieved_context:
- "ADR-004"
- "AUTH-002"
verdict:
status: "warn"
reason: "SSO provider dependency requires boundary review"
The human still reviews the code. But the system has already made the architectural contract visible — what was supposed to be true, what was checked, what came back as a warning, and what reasoning trail produced the change.
How this relates to Mneme
This is the direction Mneme is built around: architectural governance before and during generation, with repo-native decisions that can be retrieved, enforced, and surfaced in CI.
The point is not to make developers write more process documents. The point is to make architectural intent available to the systems now producing and reviewing code. Mneme treats ADRs, constraints, and architectural decisions as enforceable context rather than passive documentation. Machine-readable PRs are one natural surface where that governance can be propagated.
Conclusion: human-readable and machine-verifiable
The future PR is not less human. It is human-readable and machine-verifiable.
Humans still need the story: why the change matters, what trade-offs were made, what risk remains.
Machines need the contract: which decisions apply, which invariants were checked, what provenance exists, and whether the change is allowed.
That is the next frontier for agentic software development: not just faster code generation, but structured verification of the changes agents produce.