Mneme enforces architectural decisions at generation time. It retrieves the relevant decisions for the current edit, evaluates the staged change against them with a precedence engine, and returns a binary verdict before the code reaches review.

How it works

Human-readable decisions,
deterministic enforcement.

Mneme lets teams author architectural decisions in plain language, then enforces them as structured rules at the moment an AI agent is about to generate code — before the output exists.

The problem with prose decisions

Architecture Decision Records and team conventions live in documents. Engineers write them carefully. But an AI coding agent working three weeks later does not reliably recall that ADR-007 forbids direct BigQuery access from frontend routes, or that the team standardized on Pub/Sub and retired Celery.

Prose is for humans. It communicates intent, rationale, and trade-offs. But prose alone cannot produce a verdict. It cannot be checked at generation time. It cannot tell an agent to stop.

Mneme solves that gap without replacing the prose. ADRs remain the human artifact. Mneme maintains a parallel structured representation that enforcement can reason against.

The enforcement flow

Every check Mneme runs follows the same path:

01
Author decisions in human-readable form. Write ADRs, policy documents, or team conventions as you normally would. Mneme provides a structured schema alongside them — each rule gets an id, a title, constraints, and the anti-patterns that should block generation.
02
Retrieve the relevant rules for the current task. When a developer or agent starts a task, Mneme surfaces the decisions most likely to apply. The retrieval is deterministic: same task description, same memory, same rules surface — every time.
03
Check the prompt against the retrieved rules. Before the model generates output, Mneme evaluates the prompt against the constraints and anti-patterns in the retrieved decisions. A match produces a structured verdict: PASS, WARN, or FAIL.
04
Record an auditable trace. Every verdict records which rule matched, which term in the prompt triggered it, and why that rule was surfaced. A human can reconstruct any verdict from the artifacts — no hidden scoring, no black box.

A concrete example

The team has decided: no second LLM provider in v1. Anthropic SDK only.

From decision to verdict

—

The decisionThe team has recorded: "Do not introduce a provider-abstraction layer like litellm. The only LLM provider is Anthropic."

↓

The structured ruleMneme holds this as a structured constraint: the anti-pattern is litellm; the constraint is no second LLM provider.

↓

The promptAn agent's task reads: "Add litellm as the provider abstraction layer so we can swap models later."

↓

The verdictFAIL — anti-pattern matched before generation. The decision that fired, the term that triggered it, and the rule text are all recorded in the output.

The key distinction. This check runs before the model generates a single line of code. The violation is caught at the prompt boundary, not in a code review after the output already exists. That is the difference between governance and audit.

Why deterministic enforcement matters

Enforcement that varies between runs is not governance — it is suggestion. Mneme is built around the principle that the same decision, the same task, and the same memory must produce the same verdict, every time, in every environment.

This determinism is what makes governance auditable. When a CI step fails or an agent is blocked, the verdict is reconstructible: the rule that matched is recorded, the term that triggered it is recorded, and the score that surfaced that rule is recorded. There is nothing probabilistic to investigate.

It also makes regressions visible. Any change to the enforcement layer that would alter a verdict is detectable against the frozen benchmark suite — nothing can drift silently.

For deeper detail on the retrieval mechanics, the benchmark methodology, and the Layer 1 charter, see the architecture doc in the source repository.

Reference Governance violations → Twelve concrete examples of what enforcement catches, across architecture, security, dependency, and platform categories. Benchmark Benchmark → The v1.1 benchmark suite: layered retrieval and enforcement scoring, structured verification, pre-registered thresholds. Reference CLI reference → Run mneme check in CI or as a pre-commit hook. Exit codes, flags, and GitHub Actions patterns.

Human-readable decisions,deterministic enforcement.

The problem with prose decisions

The enforcement flow

A concrete example

Why deterministic enforcement matters

Human-readable decisions,
deterministic enforcement.