The naive model of agent output is one file at a time, all of it source code. That model has never been right and is increasingly wrong as agents take on more autonomous work. A single PR from a long-running agent typically writes to a dozen surfaces, only a handful of which are .py, .ts, or .go files. Each non-code surface carries organizational intent. Each one can drift. Most do not appear in code review.

Execution surfaces is the inventory of where agent output lands. It is the prerequisite for talking honestly about governance coverage: you cannot govern what you have not enumerated.

The inventory

A reasonable taxonomy of execution surfaces for a modern autonomous engineering workflow:

Source · tracked
Code and configuration
  • Application source files
  • Schema and migration files
  • Feature flags and config files
  • Generated client/server stubs
  • Type definitions and contracts
Process · meta
Branches, commits, PRs
  • Branch names and namespaces
  • Commit messages and trailers
  • PR titles and descriptions
  • PR labels, reviewers, assignees
  • Tag policy and release labels
Infrastructure
CI, deploy, runtime config
  • CI workflow files
  • Build and test configuration
  • Deployment manifests
  • Secret references and IAM rules
  • Container, runtime, scaling configs
Documentation
Docs, runbooks, ADRs
  • READMEs and module docs
  • Runbooks and on-call notes
  • ADR drafts and architectural notes
  • Inline comments and docstrings
  • Release notes and changelogs
Agent-produced
Plans, memory, traces
  • Session plans and task lists
  • Memory files and progress logs
  • Tool traces and execution records
  • Inter-agent handoff artifacts
  • Generated workflow definitions
External
Outbound side effects
  • Issue and ticket updates
  • Chat and notification messages
  • Calls to external APIs and queues
  • Webhook payloads
  • Status pages and dashboards

This is not exhaustive. A team running its own taxonomy will add and remove categories. The shape is what matters: there are categories of execution surface, each category has multiple surfaces, and each surface has its own conventions and constraints. Governance has to know which surfaces exist before it can decide which to cover.

The coverage gap

Most teams that have built any governance at all have built it for one category: source code. The other categories operate on convention alone — "we write branch names this way," "we structure PR titles like this," "our CI workflows follow this pattern." When the only thing writing those artifacts was humans, that worked. When agents start producing them at velocity, convention stops being self-enforcing.

Typical coverage · before vs after autonomous agents
Source code
covered · tests, lint, review
Branches & PRs
partial · CODEOWNERS, naming check
CI & deploy
partial · usually convention-only
Generated docs
gap · rarely reviewed
Agent artifacts
gap · treated as ephemeral
External effects
gap · outside the repository
Bars are illustrative: a team's actual coverage map is the first artifact a serious agent governance program produces.

The pattern is consistent across teams. Source code is governed; the surrounding execution surfaces are mostly not. As long as humans write the surrounding artifacts, "mostly not" is acceptable. As soon as autonomous agents produce most of them, the same coverage map becomes structural drift.

Why autonomous agents make this acute

Humans who break a branch-naming convention leave fingerprints. The branch shows up oddly in tooling, the PR title looks off, a teammate notices in standup. The convention is held in place by a thousand small social signals.

Autonomous agents do not provide those signals. They produce artifacts that look right: idiomatic commit messages, plausible branch names, well-formed YAML, syntactically correct ADR drafts. The artifacts that look most right are exactly the ones least likely to be reviewed — review attention is finite and naturally drifts to where something obviously looks wrong.

The result is a category of drift that is almost invisible per-PR and corrosive in aggregate: a slow rotation of every convention not explicitly enforced into whatever pattern the model finds most likely. Across enough autonomous work, the team's conventions converge to the model's defaults rather than the team's intent.

The execution surfaces with the lowest review coverage are the ones agents drift first. Not because the agent is malicious, but because no signal corrects the drift.

How execution surfaces relate to governance propagation

Execution surfaces is the inventory. Governance propagation is the act of applying governance across that inventory. They are complementary, not redundant: the inventory tells you what surfaces exist; propagation is the discipline of ensuring each surface has the constraints it needs.

A serious governance program starts by mapping the inventory of execution surfaces for its workflows. The map is the input to propagation planning. Without the inventory, propagation defaults to whatever surfaces happen to be visible — usually source code — and the other categories drift silently.

The architectural framing

The reason this concept matters at the infrastructure level, and not just as a checklist, is the asymmetry between how artifacts are produced and how they are read. Agents produce all the surfaces at once, in a single run. Humans and downstream systems read them in different contexts, at different times, with different attention budgets. The branch name is read by tooling. The PR description is read by reviewers and by future archaeologists. The commit message is read by git log archaeology and release-note generators. The CI config is read by the build system and by anyone debugging a failure. Each reader carries its own expectations.

Governance has to enforce expectations across all those reader contexts. That cannot happen if the governance system only knows about source code. Each execution surface needs its own constraint surface, and the inventory is the artifact that makes that mapping explicit.

Governance is shaped like its inventory. The execution-surfaces map determines what governance can cover; whatever is missing from the map is ungoverned by default.

The strategic point

Autonomous engineering raises the value of structural inventories. Source code was the obvious surface when humans typed every artifact and conventions held themselves together socially. As agents take on more autonomous work, the question of "what surfaces does my workflow touch?" becomes the question of "what surfaces is my governance covering?" The answer to the first question used to be obvious. The answer to the second question requires the first to be written down.

That is what makes execution surfaces a concept rather than a list: it is the structural artifact that governance has to be planned against. Teams that map their execution surfaces can talk meaningfully about coverage. Teams that have not mapped them cannot.

Related concepts