In May 2026, Matthew Kropp, Julie Bedard, Emma Wiles, and Megan Hsu published Why You Shouldn’t Treat AI Agents Like Employees in Harvard Business Review, drawing on a randomized experiment run through the BCG Henderson Institute. The finding is counterintuitive enough to be worth stating up front: humanizing an AI agent — giving it a name, a persona, the framing of a colleague — does not meaningfully increase people’s willingness to adopt it, and it carries real costs.
The study identifies four. Humanizing shifts accountability away from individuals. It makes decision-making more deferential, so people escalate to the agent rather than direct it. It degrades the quality of review and oversight. And it erodes professional identity and trust. None of these is offset by the adoption benefit leaders usually assume humanization buys, because that benefit largely does not materialize.
The authors’ conclusion is not anti-AI. It is about framing. The challenge is not whether to adopt agents, but how to integrate them into workflows in ways that preserve accountability, maintain quality, and let people work effectively alongside them. The employee metaphor fails that test, because it imports a set of expectations — judgment, growth, accountability, consequence — that an agent does not satisfy.
This is a management finding. It is also, almost word for word, a description of how most engineering teams currently relate to their coding agents.
We already onboard agents like junior engineers
Look at the vocabulary that has grown up around AI coding tools. We “onboard” an agent with a CLAUDE.md, the way we’d hand a new hire a setup doc. We talk about “trusting” the agent with larger tasks as it “proves itself.” We “coach” it through prompts. We “review its PRs” the way we’d review a teammate’s. The mental model is unmistakably the junior employee: capable, improvable, and — this is the load-bearing assumption — accountable for its work.
That last assumption is where the metaphor breaks, and the HBR research names exactly the failure. An employee is accountable because they have things an agent does not: continuity, incentives, reputation, and consequences. A junior engineer who repeatedly violates the architecture learns, or is managed out. The accountability is real because the feedback loop is real.
A coding agent has none of that. It does not remember yesterday’s correction unless you re-inject it. It bears no consequence for shipping a violation. It has no reputation to protect and no incentive to protect it. When you “trust” it the way you trust an employee, you are extending a relationship that only one party is structurally capable of holding up.
The four harms, in an engineering org
Each consequence the study identifies has a precise, recognizable form inside a codebase.
Notice that every one of these is an accountability failure, and every one is made worse by the same thing: treating an unverified, non-accountable system as if it were a trusted, accountable one. The study’s remedy is to preserve accountability by design. In engineering, “by design” has a specific meaning.
The replacement for trust is enforcement
If you cannot trust an agent the way you trust an employee, what do you do instead? You do what you do with any powerful system that has no judgment of its own: you constrain it, and you instrument it.
That is the difference between managing staff and governing infrastructure. You do not “trust” a database to respect a foreign-key constraint; you define the constraint and the database enforces it regardless of what any query intends. You do not “coach” a CI pipeline into running the tests; the pipeline runs them deterministically. Accountability for those systems stays where it belongs — with the humans who set the policy — precisely because the policy is enforced rather than hoped for.
This is what architectural governance does to the agent relationship. Constraints are enforced before generation, so a violation is blocked rather than caught. Enforcement is deterministic, so it does not depend on the agent’s good intentions or the reviewer’s alertness on a Friday afternoon. And every enforced verdict carries provenance — it traces back to a specific decision a human recorded — so accountability remains legible and lands on a person, not on a persona.
You can delegate the work to an agent. You cannot delegate the accountability, because the agent has nothing to be accountable with. Governance is what keeps the accountability attached to a human after the work is delegated.The HBR finding, applied to the coding agent
Why this is not just a wording problem
It would be easy to read the HBR piece as a plea to stop giving agents cute names. It is more than that, and so is its engineering corollary. The humanized framing is not only cosmetic; it changes behavior, and the behavior it changes is oversight. People defer more, review less carefully, and let accountability slide — the study measured these as effects, not opinions.
In an engineering org running agents at volume, those behavioral effects are the difference between a codebase that stays coherent and one that drifts. You cannot fix a behavioral oversight gap with a reminder to be more careful, any more than you can fix the convergence of probabilistic output with a stronger prompt. You fix it by moving the binding check out of the place the research shows is compromised — deferential human review of a trusted-seeming colleague — and into a place that does not depend on human vigilance at all: deterministic enforcement before the code is ever written.
What engineering leaders should take from this
The HBR research is a warning against a metaphor, and the metaphor is already running in most AI-assisted engineering teams. The takeaways translate cleanly:
- Drop the employee mental model. The agent is not a junior engineer who will learn and be accountable. Treating it as one relocates accountability onto something that cannot carry it.
- Replace trust with enforcement. Trust is the right currency for people. For a system with no judgment of its own, the right currency is constraints that hold regardless of intent.
- Move the binding check before generation. The study shows humanized framing degrades review. Do not rely on the compromised step; enforce earlier.
- Keep accountability legible. Provenance — every verdict traceable to a recorded human decision — is how responsibility stays with a person after the work is handed to a machine.
Kropp, Bedard, Wiles, and Hsu gave leaders a precise reason to stop anthropomorphizing their agents: it does not help adoption and it quietly dismantles oversight. For engineering teams, the constructive half of that finding is the more important one. You do not need the agent to be trustworthy. You need your governance to make trustworthiness beside the point.