The relationship
Claude Agent SDK manages agent loops: tool selection, message passing, streaming, multi-turn context, subagent handoffs. These are execution concerns. Mneme does not touch any of them.
What Mneme does is sit at the boundary between your agent's intentions and your codebase. Before a file write lands, Mneme checks the proposed content against your recorded decisions. After a workflow completes, Mneme can verify the output against the full invariant set. The relationship is infrastructure-shaped: the SDK runs agents; Mneme is what the agents answer to architecturally.
Execution layer
- Agent loop and tool dispatch
- Multi-turn context and streaming
- Subagent orchestration and handoffs
- Tool schema definition and calling
- Retry and error handling within the loop
- Session lifecycle management
Enforcement layer
- Architectural decision corpus (project_memory.json)
- Pre-execution governance checks
- Deterministic keyword retrieval (K=3, no embeddings)
- Structured PASS / WARN / FAIL verdicts with decision IDs
- Post-execution verification against invariant set
- CI-gateable enforcement trace output
Mneme is not "another agent." It does not call the model. It does not generate text. It scores proposed changes against a pre-registered decision corpus and returns a verdict. The agents answer to it; it does not answer to the agents.
Pre-execution governance
The most direct integration point is a governance hook you call before your agent writes any file. Wrap the SDK's tool execution callback to intercept file-write operations, run mneme check against the proposed content, and allow or abort based on the verdict.
Below is a complete Python hook that intercepts a write_file tool call before it executes:
import subprocess, json, sys
from anthropic import Anthropic
client = Anthropic()
def mneme_check(file_path: str, content: str) -> dict:
result = subprocess.run(
["mneme", "check", "--file", file_path, "--mode", "warn"],
input=content,
capture_output=True,
text=True,
timeout=10,
)
try:
return json.loads(result.stdout)
except Exception:
return {"verdict": "PASS", "decisions": []}
def governance_hook(tool_name: str, tool_input: dict) -> dict | None:
if tool_name != "write_file":
return None
path = tool_input.get("path", "")
content = tool_input.get("content", "")
verdict = mneme_check(path, content)
if verdict.get("verdict") == "FAIL":
decision_id = verdict.get("decisions", [{}])[0].get("id", "unknown")
return {
"type": "tool_result",
"is_error": True,
"content": (
f"[mneme] Governance violation: {decision_id}. "
f"Rationale: {verdict.get('decisions', [{}])[0].get('rationale', '')}. "
f"Revise the approach to comply with this decision."
),
}
return None
tools = [
{
"name": "write_file",
"description": "Write content to a file path.",
"input_schema": {
"type": "object",
"properties": {
"path": {"type": "string"},
"content": {"type": "string"},
},
"required": ["path", "content"],
},
}
]
messages = [{"role": "user", "content": "Implement the storage module."}]
while True:
response = client.messages.create(
model="claude-opus-4-5",
max_tokens=4096,
tools=tools,
messages=messages,
)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
break
tool_results = []
for block in response.content:
if block.type != "tool_use":
continue
blocked = governance_hook(block.name, block.input)
if blocked:
tool_results.append({"type": "tool_result", "tool_use_id": block.id, **blocked})
else:
result_content = execute_tool(block.name, block.input)
tool_results.append({"type": "tool_result", "tool_use_id": block.id, "content": result_content})
messages.append({"role": "user", "content": tool_results})
The hook intercepts at the tool_use block boundary, before execute_tool is called. A FAIL verdict returns an error tool result that re-enters the agent loop — the agent reads the decision id and rationale, revises its approach, and retries without human intervention. A PASS or WARN verdict allows execution to proceed normally.
Fail-open by design. If mneme is not found on $PATH, if the JSON parse fails, or if the subprocess times out (10 s hard limit), the hook returns None and the tool call is allowed. A broken governance layer should not lock an agent workflow. Hard-blocking lives in CI.
A sample governance trace from the pre-execution check, printed to stderr alongside your agent log:
Post-execution verification
Pre-execution checks stop violations before they land. Post-execution verification catches what slipped through — partial matches, composite violations, or changes that look compliant per-file but drift when read as a whole. Run a verification pass after your agent workflow completes:
import subprocess, json, pathlib
def verify_outputs(changed_files: list[str]) -> list[dict]:
violations = []
for path in changed_files:
content = pathlib.Path(path).read_text()
result = subprocess.run(
["mneme", "check", "--file", path, "--mode", "strict"],
input=content,
capture_output=True,
text=True,
timeout=10,
)
try:
verdict = json.loads(result.stdout)
if verdict.get("verdict") in ("FAIL", "WARN"):
violations.append({"file": path, **verdict})
except Exception:
pass
return violations
violations = verify_outputs(agent_output_files)
if any(v["verdict"] == "FAIL" for v in violations):
print(json.dumps(violations, indent=2), file=sys.stderr)
sys.exit(1)
The post-execution pass uses --mode strict, which treats WARN as FAIL. WARN in the pre-execution hook (to avoid blocking mid-loop on ambiguous signals) paired with strict post-execution verification gives you a two-stage gate: guide the agent while it works, then enforce cleanly when it finishes.
A post-execution enforcement trace:
Enforcement traces in long-running workflows
For autonomous agent workflows that run unattended — scheduled coding tasks, remediation loops, multi-step refactors — the enforcement trace is your primary audit artifact. A governance check that returns PASS is not just a green light; it is a timestamped record that the proposed change was evaluated against the current corpus and found compliant. This matters in three contexts:
- Audit trail. When a long-running workflow produces a diff, you need to know which decisions were in scope at the time the change was generated, not just whether the final output passes today's corpus. The structured trace output (JSON per check, with decision IDs and scores) gives you this.
- CI gates. A governance trace written to a file or stdout can be consumed by your CI pipeline. If any file in the agent's output has a FAIL verdict in the trace, the pipeline fails. This is not a soft advisory — it is a hard gate.
- Remediation loop signal. For autonomous loops where the agent is expected to self-correct, the decision id in a FAIL verdict is the signal. The agent reads which decision was violated, retrieves the rationale from the corpus, and reformulates the change. The loop terminates when the post-execution pass is clean, not when the agent decides it is done.
Deterministic retrieval matters here. Mneme uses keyword scoring, not embeddings. The same query against the same corpus returns the same top-K decisions every time. This means enforcement traces are reproducible: if you re-run a check against the same content and corpus, you get the same verdict. Embedding-based retrieval cannot make this guarantee.
CI integration
Add a governance gate step in your agent workflow's CI configuration. The step runs after your agent produces output and before the PR is opened or the deploy proceeds:
name: agent-governance-gate
on: [push, pull_request]
jobs:
governance:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Install Mneme
run: pip install mneme
- name: Run agent workflow
run: python scripts/agent_workflow.py
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
- name: Governance gate
run: |
mneme check \
--files-changed $(git diff --name-only HEAD~1) \
--mode strict \
--output-format json \
--trace-file governance-trace.json
shell: bash
- name: Upload governance trace
if: always()
uses: actions/upload-artifact@v4
with:
name: governance-trace
path: governance-trace.json
The --mode strict flag means any WARN in the trace causes a non-zero exit and fails the step. The --trace-file flag writes the structured JSON trace as a CI artifact, giving you a per-run record of which decisions were evaluated and what the verdicts were. Use --mode warn during development iterations to surface signals without blocking merges.
For the GitHub Actions integration with finer-grained control over which file patterns trigger governance checks, see the GitHub Actions integration page.
FAQ
Is Mneme itself an agent inside the SDK?
Does the governance hook add meaningful latency to agent tool calls?
What if my agent uses non-file tools — API calls, shell commands?
mneme check against the proposed operation text. The check is most precise when the proposed change involves identifiable architectural surface (file paths, dependency names, module patterns). For generic API calls, you can still inject relevant decisions as context at the start of the agent loop so the agent reasons from the corpus rather than from training defaults. Use mneme context --query "api call to external service" to retrieve the relevant decisions for injection.