Why shouldn't you treat AI agents like employees?

A randomized experiment from BCG and Harvard Business Review found that humanizing AI agents shifts accountability away from individuals, makes oversight more deferential, degrades review quality, and can damage professional identity. It also does not meaningfully increase adoption. The employee metaphor imports expectations that do not hold: an agent has no incentives, no consequences, and no accountability of its own.

How does this apply to AI coding agents?

Teams onboard coding agents like junior engineers: a CLAUDE.md as the onboarding doc, conventions to follow on trust, PRs reviewed as if a person wrote them. But the agent does not learn across sessions, carries no accountability, and faces no consequence for drift. Trusting it like an employee substitutes hope for control. The replacement is deterministic governance: constraints enforced before generation, independent of the agent's good intentions.

If agents aren't employees, what mental model should engineering teams use?

Govern agents like infrastructure, not staff. Infrastructure is constrained by policy that holds regardless of intent, instrumented so every action is traceable, and never assumed to be accountable for itself. Accountability stays with the humans who set the constraints. Enforcement provenance, where every verdict traces to a recorded decision, keeps that accountability legible.

Doesn't code review keep agents accountable?

The study's warning is precisely that humanized framing degrades review quality and makes reviewers more deferential. Reviewing agent output as if a trusted colleague produced it invites the rubber stamp. Governance moves the binding check before generation, where a violation is blocked deterministically rather than relying on a human catching it later under exactly the conditions the research shows are compromised.

Why You Shouldn’t Treat AI Agents Like Employees: The Coding-Agent Corollary

In May 2026, Matthew Kropp, Julie Bedard, Emma Wiles, and Megan Hsu published Why You Shouldn’t Treat AI Agents Like Employees in Harvard Business Review, drawing on a randomized experiment run through the BCG Henderson Institute. The finding is counterintuitive enough to be worth stating up front: humanizing an AI agent — giving it a name, a persona, the framing of a colleague — does not meaningfully increase people’s willingness to adopt it, and it carries real costs.

The study identifies four. Humanizing shifts accountability away from individuals. It makes decision-making more deferential, so people escalate to the agent rather than direct it. It degrades the quality of review and oversight. And it erodes professional identity and trust. None of these is offset by the adoption benefit leaders usually assume humanization buys, because that benefit largely does not materialize.

The authors’ conclusion is not anti-AI. It is about framing. The challenge is not whether to adopt agents, but how to integrate them into workflows in ways that preserve accountability, maintain quality, and let people work effectively alongside them. The employee metaphor fails that test, because it imports a set of expectations — judgment, growth, accountability, consequence — that an agent does not satisfy.

This is a management finding. It is also, almost word for word, a description of how most engineering teams currently relate to their coding agents.

We already onboard agents like junior engineers

Look at the vocabulary that has grown up around AI coding tools. We “onboard” an agent with a CLAUDE.md, the way we’d hand a new hire a setup doc. We talk about “trusting” the agent with larger tasks as it “proves itself.” We “coach” it through prompts. We “review its PRs” the way we’d review a teammate’s. The mental model is unmistakably the junior employee: capable, improvable, and — this is the load-bearing assumption — accountable for its work.

That last assumption is where the metaphor breaks, and the HBR research names exactly the failure. An employee is accountable because they have things an agent does not: continuity, incentives, reputation, and consequences. A junior engineer who repeatedly violates the architecture learns, or is managed out. The accountability is real because the feedback loop is real.

A coding agent has none of that. It does not remember yesterday’s correction unless you re-inject it. It bears no consequence for shipping a violation. It has no reputation to protect and no incentive to protect it. When you “trust” it the way you trust an employee, you are extending a relationship that only one party is structurally capable of holding up.

The four harms, in an engineering org

Each consequence the study identifies has a precise, recognizable form inside a codebase.

HBR’s four harms, as they show up in AI-assisted engineering

Accountability shifts away from people

“The agent generated it” becomes a place to put responsibility that no human has to hold. When a violation ships, the framing of an autonomous colleague diffuses ownership — but the agent cannot be accountable, so accountability simply evaporates instead of landing on the engineer who merged it.

Oversight becomes deferential

The study finds humanized framing makes people defer. In review, that is the engineer who assumes the agent “knows the codebase” and waves through a change they would have interrogated from a human contributor. Deference to a system that cannot earn it is how violations pass review.

Review quality degrades

Volume plus trust produces the rubber stamp. As agents generate more and larger changes, treating each PR as a trusted colleague’s work — rather than unverified output requiring enforcement — pushes oversight quality down exactly as throughput goes up.

Professional identity erodes

When engineers are recast as managers of an “AI teammate” rather than owners of the architecture, the sense of authorship and responsibility that drives careful engineering weakens. The architecture stops being something a person stands behind.

Notice that every one of these is an accountability failure, and every one is made worse by the same thing: treating an unverified, non-accountable system as if it were a trusted, accountable one. The study’s remedy is to preserve accountability by design. In engineering, “by design” has a specific meaning.

The replacement for trust is enforcement

If you cannot trust an agent the way you trust an employee, what do you do instead? You do what you do with any powerful system that has no judgment of its own: you constrain it, and you instrument it.

That is the difference between managing staff and governing infrastructure. You do not “trust” a database to respect a foreign-key constraint; you define the constraint and the database enforces it regardless of what any query intends. You do not “coach” a CI pipeline into running the tests; the pipeline runs them deterministically. Accountability for those systems stays where it belongs — with the humans who set the policy — precisely because the policy is enforced rather than hoped for.

Agent as employee

Trust, then review

Onboard with a doc. Extend trust as it “proves itself.” Review output as a colleague’s. Accountability is assumed to sit with the agent — so when something slips, it sits nowhere.

Agent as governed infrastructure

Constrain, then verify

Encode decisions as policy. Enforce them before generation, independent of intent. Trace every verdict to a recorded decision. Accountability stays explicitly with the humans who set the constraints.

This is what architectural governance does to the agent relationship. Constraints are enforced before generation, so a violation is blocked rather than caught. Enforcement is deterministic, so it does not depend on the agent’s good intentions or the reviewer’s alertness on a Friday afternoon. And every enforced verdict carries provenance — it traces back to a specific decision a human recorded — so accountability remains legible and lands on a person, not on a persona.

You can delegate the work to an agent. You cannot delegate the accountability, because the agent has nothing to be accountable with. Governance is what keeps the accountability attached to a human after the work is delegated.

The HBR finding, applied to the coding agent

Why this is not just a wording problem

It would be easy to read the HBR piece as a plea to stop giving agents cute names. It is more than that, and so is its engineering corollary. The humanized framing is not only cosmetic; it changes behavior, and the behavior it changes is oversight. People defer more, review less carefully, and let accountability slide — the study measured these as effects, not opinions.

In an engineering org running agents at volume, those behavioral effects are the difference between a codebase that stays coherent and one that drifts. You cannot fix a behavioral oversight gap with a reminder to be more careful, any more than you can fix the convergence of probabilistic output with a stronger prompt. You fix it by moving the binding check out of the place the research shows is compromised — deferential human review of a trusted-seeming colleague — and into a place that does not depend on human vigilance at all: deterministic enforcement before the code is ever written.

Onboard with a doc

Encode decisions as enforceable policy

Extend trust over time

Apply constraints on every run, from the first

Review like a colleague’s PR

Block violations before generation

Hold the agent accountable

Keep accountability with humans via provenance

What engineering leaders should take from this

The HBR research is a warning against a metaphor, and the metaphor is already running in most AI-assisted engineering teams. The takeaways translate cleanly:

Drop the employee mental model. The agent is not a junior engineer who will learn and be accountable. Treating it as one relocates accountability onto something that cannot carry it.
Replace trust with enforcement. Trust is the right currency for people. For a system with no judgment of its own, the right currency is constraints that hold regardless of intent.
Move the binding check before generation. The study shows humanized framing degrades review. Do not rely on the compromised step; enforce earlier.
Keep accountability legible. Provenance — every verdict traceable to a recorded human decision — is how responsibility stays with a person after the work is handed to a machine.

Kropp, Bedard, Wiles, and Hsu gave leaders a precise reason to stop anthropomorphizing their agents: it does not help adoption and it quietly dismantles oversight. For engineering teams, the constructive half of that finding is the more important one. You do not need the agent to be trustworthy. You need your governance to make trustworthiness beside the point.

We already onboard agents like junior engineers

The four harms, in an engineering org

The replacement for trust is enforcement

Why this is not just a wording problem

What engineering leaders should take from this

Frequently asked questions

Related reading