Why artifacts matter

Agent-first IDEs like Google Antigravity surface what agents do as inspectable Artifacts: plans, diffs, screenshots, browser recordings, task lists, verification outputs. That is real progress. Before Artifacts, the only thing a reviewer could examine was the diff — and even that was missing context about why the agent took the path it took.

Artifacts make agent work auditable. They are necessary for trust at scale.

What artifacts can show

A good artifact stream can answer:

  • What the agent planned
  • What commands it executed
  • What files it changed
  • What the browser saw at the end
  • What verification tests it ran and their outputs

That is enough to reconstruct the run, identify failure modes, and reproduce or roll back the result. It is downstream observability for an autonomous agent.

What artifacts cannot prevent

Artifacts cannot prevent the agent from:

  • Introducing a forbidden dependency
  • Bypassing an approved abstraction
  • Violating an ADR the team has explicitly accepted
  • Picking a deprecated pattern because it exists nearby
  • Choosing a path that contradicts another agent’s work in the next workspace

By the time the artifact exists, the choice has already been made.

A screenshot can prove the agent opened the browser. It cannot prove the agent respected your architecture.

The difference between provenance and policy

Artifact provenance is a record of what happened — the trail. Architectural governance is a policy — a binding constraint about what is allowed. The two are complementary, not substitutes.

You can have artifacts without policy: every agent run is auditable, but nothing prevents drift. You can have policy without artifacts: violations are blocked, but reviewers cannot reconstruct why a particular run did what it did. The robust pattern is both: policy that constrains, artifacts that document.

Why agent systems need both

Three reasons the combination matters:

  • Pre-generation constraints — the governance layer says no before the work is done. Cheaper than rework.
  • Post-execution traceability — when something does slip through, the artifact trail makes the failure debuggable.
  • Provenance over verdicts — the governance verdict itself becomes part of the artifact stream, so reviewers see not just what the agent did but which constraints applied to that work.

Mneme’s role

Mneme turns architectural decisions into checks, not just documentation. The check produces a structured verdict — PASS, WARN, FAIL — with a provenance trace back to the originating ADR. That verdict becomes a first-class artifact in its own right: the policy layer’s contribution to the agent’s evidence trail.

The artifact stream tells you what happened. The governance verdict tells you whether it was supposed to happen. Both belong in the post-execution view, and the policy layer’s output belongs at the front of it.

Conclusion

Reviewability is not enforcement. Build both: provenance so humans can inspect, and policy so the system can constrain. Either one alone leaves a hole the other is supposed to cover.