Why AI-generated code needs a CI gate

Code review evolved as a human-throughput process. One author writes; one or two reviewers read. The economics work because the production rate matches the review rate.

Agent-assisted development breaks that ratio. A single engineer running Claude Code, Cursor, Copilot, or a multi-agent harness can produce more reviewable diff in an afternoon than a reviewer can deeply assess in a week. The natural failure mode is not malicious code — it is shallow approval: reviews that scan for surface signals because the volume forecloses the time for architectural depth.

CI governance is the deterministic check that does not depend on reviewer attention. It runs on every diff, every time, against an explicit corpus of architectural decisions. Whatever the agent generates, whichever model produced it, however many parallel sessions ran — the gate is the same.

The argument expanded. For the case that review and governance are different layers and one cannot substitute for the other, see Review is not governance and AI code review does not scale linearly.

What CI governance catches that prompt engineering misses

Prompt-side context (system prompts, CLAUDE.md, .cursorrules) is upstream of generation. It works when the agent reads it, attends to it, and respects it. CI governance is downstream of generation and is independent of all three.

  • Drift from agents you don't control. Contractor laptops, agentic workflows triggered from issues, autonomous PR bots — whatever produced the diff still has to clear the gate.
  • Hallucinated dependencies. The agent imports a package nobody approved; the gate sees an import that violates the dependency policy and blocks the merge.
  • Bypassed architectural boundaries. "Just this one place" cross-module access that prompt context didn't catch because the diff looked locally reasonable. The gate evaluates against the structural corpus, not the local read.
  • Decisions made after the agent's training cut-off. A model that knows nothing about your team's ADR-027 from last month still cannot land code that violates it, because the corpus is read at check time, not at training time.

The two-layer model

Mneme is designed to run as two enforcement layers, not one:

  1. Pre-generation. The Claude Code PreToolUse hook intercepts Edit, Write, and MultiEdit before disk. Cursor Rules deliver the same corpus to Cursor sessions. Most violations never get written.
  2. Post-generation, pre-merge. mneme check runs in CI against every PR diff. Anything that slipped past the first layer — or originated outside an enforced session — is caught here.

The two layers share the same decision corpus (.mneme/project_memory.json). There is no per-tool duplication. A decision recorded once is enforced in both places.

What gets enforced at the CI boundary

  • ADRs. Architectural Decision Records, parsed into structured decisions and enforced as constraints, not as docs.
  • Forbidden dependencies. Packages, modules, or patterns the team has explicitly ruled out (with rationale and supersedes history).
  • Approved architectural patterns. Layering rules, allowed transitive imports, framework-specific structure.
  • Path and boundary rules. "No direct database access from controllers", "auth code stays in /security/", etc.
  • Pattern-level invariants. Things that must be true across the codebase (e.g. error-handling convention, naming, logger usage).

Setup

GitHub Actions

# .github/workflows/mneme.yml
name: Mneme governance
on:
  pull_request:
    branches: [main]

jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
        with:
          fetch-depth: 0
      - uses: actions/setup-python@v5
        with:
          python-version: '3.11'
      - run: pip install mneme
      - run: mneme check --mode warn

See GitHub Actions AI Governance for the full integration page.

GitLab CI

# .gitlab-ci.yml
mneme-check:
  image: python:3.11
  rules:
    - if: $CI_PIPELINE_SOURCE == "merge_request_event"
  script:
    - pip install mneme
    - mneme check --mode warn

See GitLab CI integration for the full page.

Adoption sequence: warn before strict

The recommended rollout is staged, so the team learns where the corpus is sharp and where it is approximate before the gate starts blocking merges.

  1. Warn mode in CI. Run mneme check --mode warn on every PR. Violations surface as comments; merges proceed. Use this phase to harden the decision corpus against real diffs.
  2. Strict mode on the most stable repos first. Promote to --mode strict on services where the architectural corpus has settled. Other repos stay in warn.
  3. Pre-generation enforcement. Once the corpus is trusted, wire the Claude Code hook so violations fail fast in the editor, not at PR time.

Why warn-then-strict. A corpus is only as useful as it is true. The warn phase is where ADRs that should be enforced but were never written down get surfaced and codified. Skipping it tends to produce a strict gate that the team works around — the worst of both layers.

FAQ

Why isn't code review enough for AI-generated code?
Code review is a human-throughput process; AI generation is not. As agent-produced volume scales, manual review degrades into shallow approval. CI governance is the deterministic check that does not depend on reviewer attention — it runs on every diff, every time, against an explicit corpus of architectural decisions.
Does this replace pre-generation enforcement?
No. Pre-generation enforcement (via Claude Code hooks or Cursor Rules) catches most violations before code is written. CI governance is the second layer, catching what slipped past — diffs from agents without hook integration, edits made outside an enforced session, or human edits that violate ADRs.
Will it block all my PRs in warn mode?
No. mneme check --mode warn reports violations as PR comments without blocking the merge. Teams use this to baseline existing drift before promoting to strict mode on the repos where the architectural corpus is stable.