The short answer. Generative AI software engineering is now a seven-layer stack: foundation models, context and retrieval, agent runtime, tooling and execution, governance and architectural control, validation and evaluation, and human oversight. Layers 1–3 are crowded with well-funded vendors. Layer 4 is consolidating around MCP. Layer 5 — governance — is structurally underbuilt because the problem only becomes felt at the scale where layers 1–4 have already done their work. That is the wedge.
Why a stack frame at all
Workflows describe how one team uses tools today. Stacks describe the layers any serious organization eventually has to operate, regardless of which vendors it picks. A stack frame makes it easier to see which layers are crowded with capital, which are underbuilt, and where the architectural risk concentrates. It is also how engineering leaders actually think when they plan a 24-month bet: layer by layer, with explicit owners and explicit failure modes.
The seven-layer frame below is a reference, not a product taxonomy. It will read familiar to anyone who has shipped against agentic coding tooling in the last year — the layers are the ones the field has converged on, even if the labels have not.
The seven layers
Raw reasoning and generation
OpenAI, Anthropic, Google, Meta, DeepSeek, Mistral, xAI
Purpose. Raw reasoning and generation capability. The substrate every higher layer depends on. The frontier moves quarterly; the API surface is increasingly commoditized behind OpenAI-compatible endpoints.
Provide relevant context to models
RAG pipelines, vector databases, embeddings, semantic search, memory systems
Purpose. Surface the right tokens at the right time so the model can answer with grounded context. Effective for documentation lookup; structurally insufficient for authoritative constraint enforcement, which requires precedence and exactness rather than nearest-neighbor recall.
Coordinate tools, steps, and agents
LangGraph, CrewAI, AutoGen, OpenAI Agents SDK, Claude Agent SDK, Claude Code, Cursor Agent
Purpose. Plan multi-step work, route between tools, coordinate sub-agents, persist intermediate state. The layer where "agentic" actually happens. Crowded with frameworks — the runtime question is settling, but the choice still matters because each runtime exposes a different seam to the layer above.
Take actions in real systems
MCP servers, REST & gRPC APIs, databases, shells, browsers, CI/CD, cloud infrastructure
Purpose. Give the agent hands. Reading is cheap; writing is where blast radius lives. Model Context Protocol has emerged as the connective tissue, with the major coding tools converging on a shared interface for tool exposure. The execution surface is wider than most teams realize once the agents start running unattended.
The Mneme category
Decision corpora, precedence engines, pre-generation enforcement, override discipline, cross-tool governance
Purpose. Make the agent answer to the project's existing decisions before it generates.
- Enforce architectural decisions across heterogeneous tools.
- Prevent silent drift between PRs, branches, and engineers.
- Inject structured constraints into context before generation.
- Preserve decision continuity as models, agents, and codebases churn.
- Validate outputs against governance rules at the seam, not in review.
Measure correctness and adherence
Benchmarks, policy tests, regression suites, eval harnesses, observability, tracing
Purpose. Quantify reliability, correctness, and governance adherence after the fact. Eval answers "did the system do the right thing?"; observability answers "what did it actually do?". Both presume layers below them are stable enough to measure.
Organizational accountability
Code review, architecture review, security review, approvals, escalation paths
Purpose. The accountability boundary. Humans approve, escalate, and own the consequences. Linear in throughput by design; this is the layer where the AI throughput delta becomes a queueing problem.
Each layer has a purpose, an emerging interface, and at least one credible vendor or pattern. They are not optional in any serious deployment. Even teams that "only use Cursor" are operating implicitly across all seven — they have just not separated the layers, which is why drift becomes structural rather than measurable.
The throughput vs. governance gap
The cleanest framing for the entire stack is one sentence: AI coding increased generation throughput. Governance and review did not scale at the same rate. Layers 1–4 are throughput layers — they make more code happen, faster. Layer 7 is a human layer that scales linearly with people. Layers 5 and 6 are the only places where the asymmetry can be closed, and only one of them — layer 6 — is well-resourced.
The implication is direct: as long as layers 1–4 keep accelerating, the only sustainable response on the control side is to push enforcement earlier. That is the structural argument for layer 5.
Layer 5, in detail
Layer 5 is the layer that operates on the agent before it generates. It is not eval (that is layer 6) and not review (that is layer 7). It is the layer that holds the project's accumulated decisions — ADRs, dependency policies, service boundaries, naming conventions, security invariants — in a structured, queryable form that the agent has to traverse before producing output.
The five jobs of layer 5, in concrete terms:
- Enforce architectural decisions. A decision the team made six months ago is treated as a hard constraint, not a passage in a CLAUDE.md the model is asked to respect. Prompt engineering is not governance; the difference is whether the constraint is enforceable at the seam.
- Prevent drift. Two engineers prompting the same agent should not get architecturally divergent answers. The decision corpus is the shared anchor that keeps generations consistent across people, branches, and time.
- Inject constraints before generation. Hooks at the file-write seam, system-prompt augmentation at session start, and structured precedence resolution when decisions conflict. The point is that the agent does not need to "remember" the decision — the decision is presented to it.
- Preserve decision continuity. Models change. Agents change. Tools change. The decision corpus persists. Continuity across this churn is the compounding asset.
- Validate outputs against governance rules. Pre-merge gates that read structured artifacts (not freeform prose) and fail closed. Less "AI reviewed this" and more "the deterministic rule that always runs has cleared this."
Why layers 1–3 are crowded
The capital allocation in this cycle has been straightforward: the further down the stack a layer sits, the more it looks like classic infrastructure, and the more it has attracted infrastructure-scale investment. Foundation models (layer 1) absorbed the bulk. Context and retrieval (layer 2) took the second wave, with a vector-database boom that has since consolidated. Agent runtimes (layer 3) are the current frontier — the SDK and framework wars are happening here in 2026.
Layer 4 is settling on MCP as the default execution interface, which is why the long tail of "API connector" startups has compressed in the past year. Layer 6 has its own established ecosystem (Braintrust, Langfuse, OpenAI Evals, OSS harnesses). Layer 7 has existed for fifty years.
Layer 5 is the gap. It is not crowded for a structural reason: governance only becomes a felt problem at the scale where layers 1–4 have already done their work. Early adopters can get away with prompt-engineered style guides and CLAUDE.md files. Later adopters — the ones running multiple coding agents across multiple repos — cannot. The pull is now arriving, which is why the category is forming. Heterogeneous-agent governance is the felt version of the problem.
What a serious layer 5 looks like
A serious layer 5 is not a config file and not a prompt. It is an addressable, structured decision corpus with a precedence engine on top, accessed through hooks the agent runtimes and editors all defer to. Concretely, the things it has to do:
- Structured decisions. Decisions encoded as records with status, scope, supersession history — not paragraphs in a markdown file. Queryable, not summarizable.
- Precedence resolution. When org policy and team override and per-PR exception conflict, a deterministic rule decides which one wins. Real teams hit this in week three.
- Pre-generation hooks. Enforcement at the seam where the agent writes — SessionStart, PreToolUse, file-write hooks. Structurally different from "we put it in the system prompt."
- Tool-agnostic surface. The same decision corpus is consulted by Claude Code in the terminal, by the Cursor agent, by the Copilot extension, and by the SDK bot opening PRs in CI. No re-encoding per tool.
- Override discipline. When a rule is weakened, the override is itself a tracked decision. An untracked override is a silent merge.
- Auditability. The system can answer "which decisions applied to this generation, and why?" after the fact — for retrospectives, for security review, for regressions.
The wedge. Almost everyone is competing in layers 1–3. Very few are building layer 5 seriously. The teams that ship a credible governance layer in 2026–2027 will be the reference for how engineering organizations operate AI coding once the building work is no longer the bottleneck.
The strategic point
Layers 1 through 3 are competitive markets with deep-pocketed incumbents. Building there means competing with frontier labs and venture-funded framework teams on their home ground. Layer 4 is consolidating around an open standard. Layer 6 has known patterns. Layer 7 cannot be vendored away.
Layer 5 is the layer where the product surface is still being defined, the customer pull is now arriving, and almost no one in the dominant ecosystems has shipped a coherent answer. That is the strategically scarce layer of this stack — and the one that determines, more than any other, whether AI-assisted engineering produces durable systems or expensive drift.
This is the frame the rest of the Mneme insights catalogue extends. Why RAG fails for governance covers why layer 2 cannot substitute for layer 5. Why prompt memory fails at scale covers why CLAUDE.md is not layer 5. Why code review cannot scale covers the layer-7 ceiling. Heterogeneous-agent governance covers the cross-tool surface layer 5 has to defend.