Does AI-Generated Code Increase Bugs and Rework? What the Faros 2026 Data Shows

AI Is Generating More Code Than Ever

The productivity case for AI coding tools is real and measurable. In Faros AI’s analysis of two years of engineering telemetry across 22,000 developers and more than 4,000 teams, moving from the lowest to the highest periods of AI adoption raised epics completed per developer by 66%, task throughput per developer by 33.7%, and pull-request merge rate per developer by 16.2%. Teams are shipping more, and faster.

But throughput is an input, not an outcome. The same dataset shows what arrives alongside the extra output, and it complicates the story considerably.

What the Faros 2026 Data Shows

Faros calls the pattern acceleration whiplash: output accelerates while the work needed to keep that output safe accelerates faster. The quality signals are the part most coverage skips.

Signal	Change under high AI adoption
Bugs per developer	+54% (vs +9% the prior year)
Code churn	+861%
Incident-to-PR ratio	+242.7%
Monthly incidents	+57.9%
PRs merged with no review at all	31.3%
Median time in code review	+441.5%

The headline is the first row. Bugs per developer rose 54% in the 2026 dataset, against a 9% rise the year before — the relationship between AI adoption and defects is not just present, it is steepening. The blunt answer to “does AI-generated code increase bugs?” is, on this evidence, yes — and increasingly so.

Churn Is Often Rework

The most striking number is code churn — the share of recently merged lines that get deleted or rewritten in the same quarter — up 861% under high AI adoption. Faros’s own first-listed explanation is rework: developers accept AI-generated code quickly, then come back to replace it when it proves insufficient in practice. Faros notes other contributors too, including newly feasible large-scale refactors, so the number is strongly suggestive of rework rather than proof that all of it is rework.

For an engineering leader, the reframe matters. A large fraction of that churn is teams correcting AI changes after the fact — and a meaningful slice of that is correction of changes that were locally fine but violated a decision the system depended on. Rework is often where architectural drift finally shows up on a dashboard.

Why Traditional Quality Controls Miss This

The reason the defects and rework slip through is that the controls teams already run were not built to catch them. Tests verify correctness. Linters verify style. Security scanners verify vulnerabilities. None of them verify whether a change conforms to the team’s architectural decisions.

Every existing gate answers “does it work?” None answers “is this how our system is supposed to be built?” That second question is the one AI-generated volume makes urgent — and the one with no automated answerer in most pipelines.

We have itemized that control gap in detail in the verification tax of AI coding agents. The Faros data is what that gap looks like once you measure it across thousands of teams instead of one.

Architectural Drift at Scale

Concretely, the drift looks mundane. A new service reaches the database directly instead of going through the agreed boundary. A second authentication pattern appears next to the approved one. Three different implementations of the same approved utility accumulate because three agents each solved the problem locally. A decision the team made last quarter is quietly forgotten. Every one of those changes can pass review. The architecture still degrades — one reasonable-looking diff at a time.

Faros adds an uncomfortable detail: organizations with mature DevOps practices experienced the same downstream deterioration as everyone else, which cuts against the comfortable assumption that strong foundations simply amplify AI’s benefits. The erosion is structural, not a sign of a weak engineering culture — which is why “just review more carefully” does not scale as an answer.

After AI, You Approve More Than You Write

The deeper shift the data implies is about authorship. Before AI, humans wrote most of the changes they were accountable for. After AI, humans approve far more changes than they personally write. When the bottleneck moves from authorship to approval, the binding problem stops being generation and becomes governance: not “can we produce this change?” but “does this change respect the decisions we already made?”

Faros frames the imperative as pushing quality back to the point of authorship, before code ever reaches review. That is the right instinct. The open question is what mechanism does it — and a human reading every AI diff is not a mechanism that scales alongside a 33.7% throughput increase.

The Emerging Layer Is Architectural Governance

The layers of the AI delivery stack are filling in: generation, then review, then security validation. The Faros data points to the next one. Architectural governance — recording the team’s decisions as executable constraints, retrieving them at generation time, and checking each proposed change against them deterministically before it merges — is the control the existing gates leave out.

AI acceleration is real, and worth keeping. The question is no longer whether teams can generate more code. It is whether they can preserve their engineering decisions while generating it. That is what turns throughput back into progress — and what keeps the 861% churn from being the standing price of the 33.7% speedup. For the broader read on the same report, see our companion piece on the acceleration whiplash and the governance gap.

Why AI Coding Productivity Gains Often Lead to More Rework

AI Is Generating More Code Than Ever

What the Faros 2026 Data Shows

Churn Is Often Rework

Why Traditional Quality Controls Miss This

Architectural Drift at Scale

After AI, You Approve More Than You Write

The Emerging Layer Is Architectural Governance

Frequently asked questions