Market Context 10 min read

Snowflake’s AI Data Engineering Report Signals a Shift Toward Governance Infrastructure

Q: What does Snowflake's AI data engineering report say?

Redefining Data Engineering in the Age of AI is an MIT Technology Review Insights report sponsored by Snowflake, based on a June 2025 survey of 400 senior data and technology executives at organizations earning $500M+. Headline findings: time data engineers spend on AI tripled from 19% (2023) to 37% (2025) and is projected to reach 61% by 2027; 20% of organizations have deployed agentic AI and 54% expect to within 12 months; 81% say AI has rewritten the data engineer job description; data governance is the #2 challenge of new AI tools at 40%; data security is the #1 challenge as AI advances at 55%.

A new MIT Technology Review Insights report — produced in partnership with Snowflake and based on a June 2025 survey of 400 senior data and technology executives at organizations earning $500M+ — contains a much bigger infrastructure signal than most commentary acknowledges. The headline is about how AI is reshaping the role of data engineers. The deeper story is structural: as AI systems move closer to production, data engineering starts evolving into governance engineering — and a new layer is being added to the enterprise AI stack.

By Theo Valmis·May 2026

The report — Redefining Data Engineering in the Age of AI, sponsored by Snowflake and editorially independent of it — surveyed 400 CIOs, CTOs, CDOs, CAIOs, and equivalent execs across 7 industries and 10 countries. The headline numbers are striking on their own:

Time spent on AI is tripling. Data engineers spent an average of 19% of their workday on AI projects in 2023, 37% in 2025, and respondents expect 61% by 2027.
Agentic AI is about to become majority-deployed. 20% of organizations have already started; 54% expect to begin within 12 months.
The job description has changed radically. 81% of executives say AI has rewritten what data engineers do. 77% say workloads are growing.
Data governance is now the #2 challenge of new AI tools (40%), behind only integration complexity (45%). The biggest companies rate it their greatest challenge.
Data security and privacy is rated the single greatest challenge as AI capabilities advance (55%).

Those findings are useful. They also point at something the report does not name directly.

As AI systems move closer to production infrastructure, data engineering starts evolving into governance engineering.

The clearest articulation of that shift in the report comes from Snowflake’s own VP of Product for Data Engineering, Chris Child:

“Over time, the data engineer role will shift from writing code for all pipelines toward managing the infrastructure that these are running in, orchestrating across a lot of these, and setting the rules and tests to make sure the right data is coming in.” — Chris Child, VP Product, Data Engineering, Snowflake

That is not pipeline work. That is governance work, named in pipeline-friendly language.

AI is turning infrastructure into execution systems

The reason this matters is that AI changes the assumptions enterprise infrastructure was designed around.

Traditional infrastructure	AI infrastructure
Deterministic	Probabilistic
Workflow-oriented	Autonomous
Human-triggered	Continuously operating
Moves data and code	Generates artifacts and decisions

The shift is not incremental. A pipeline that moves bytes is governed differently than a system that decides what bytes to move, what to change, and what to ship. The report’s own framing of agentic AI captures this directly:

“Agentic AI will give us systems that not only research, analyze, and plan, but that act on plans in a dynamic and agile way.” — Ritu Jyoti, formerly GM of AI, Automation, Data & Analytics, IDC

The moment systems gain that execution authority, governance becomes an infrastructure problem instead of a documentation problem.

Why agentic AI changes the role of data engineering

Snowflake’s population is well-positioned to see this first. Data engineering teams already own:

Orchestration
Retrieval and context systems
Workflow execution
Operational lineage
Policy surfaces
AI infrastructure reliability

Add agentic AI on top of that surface area and the responsibility expands beyond “the pipeline ran successfully.” The new questions are: did the agent stay inside its scope, did it respect the constraints, did it modify systems it was not authorized to touch, can we prove what it did and why.

That is governance — not as compliance paperwork, but as runtime operational control. And the survey shows data engineering teams already see this. When asked which benefits agentic AI brings their teams, respondents named pipeline debugging and optimization (42%), data integration (38%), orchestration across teams (34%), and data governance and compliance (33%). Governance is sitting in the top four expected benefits — right next to the pipeline-engineering tasks that have always defined the discipline.

AI systems need constraints, not just context

The instinctive response to autonomous behavior is to give the model more context. Bigger windows, richer retrieval, better embeddings. All of it helps. None of it is sufficient.

Context tells the model more. Constraints tell the system what is allowed. As autonomy scales, the gap between “the model knows” and “the rule is enforced” becomes the dominant failure mode.

This is the same line that shows up at the runtime layer: retrieval surfaces information; it does not enforce constraints. Snowflake’s data is one more data point in the same picture.

The emerging governance infrastructure layer

Enterprise AI stacks are quietly adding a new layer between agents and production systems: governance infrastructure.

Concretely, it is the systems layer that enforces:

Architectural constraints — what the system is allowed to be
Operational boundaries — what the agent is allowed to do
Execution policies — how and where actions can run
Verification rules — what must be true before a change is accepted
Deterministic invariants — same input, same state, same verdict

The primitives that compose it are already named in the Mneme ontology: governance propagation, verification contracts, runtime enforcement, architectural drift, operational consistency. The Snowflake/MIT survey is one of the clearer enterprise-facing signals that this is the layer being added next.

The numbers point in the same direction. Data governance is the second-highest-rated challenge of new AI tools at 40%, only behind integration complexity at 45%. Tool sprawl and fragmentation comes third at 38% — which is exactly the problem governance propagation solves. The biggest companies in the survey rate data governance their single greatest challenge. And the stakes are not theoretical:

“The best case scenario is that a breach results in some embarrassment. The worst case is that your business is forced to shut down.” — Dave Masino, Senior Director of Data and Intelligence, Slalom

Observability cannot solve execution drift alone

Most enterprise AI tooling today focuses on logs, traces, monitoring, evals, observability. All of it is necessary. None of it is the layer being discussed here.

Observability explains failures after execution. Governance infrastructure shapes execution before drift occurs.

The next enterprise AI challenge is not understanding autonomous behavior after the fact. It is constraining operational behavior before systems drift into invalid states.

That is the difference between forensics and infrastructure — and it is the same distinction that governance-before-generation names at the per-agent level.

The new enterprise AI infrastructure stack

The shape the market is converging toward is six layers, not four:

Foundation models

Reasoning and generation capability

Retrieval & context

Pull the right material into the prompt

Agent orchestration

Workflows, tool calling, multi-agent coordination

Observability & eval

Logs, traces, monitoring, quality measurement

Governance infrastructure

Architectural invariants, policy propagation, operational constraints

Verification & enforcement

Runtime validation, deterministic verdicts, enforcement traces

Most organizations currently stop at orchestration and observability. That is enough to ship a pilot. It is not enough to operate autonomous systems in production at scale.

The layers being added next are not optional. Layer 5 is what makes autonomy operationally sustainable. Layer 6 is what makes it auditable. Together they are what turn agentic AI from a demo into infrastructure.

Conclusion: from data engineering to governance engineering

Snowflake’s report frames this transition as an evolution in data engineering. The broader shift is infrastructural. As AI systems gain operational authority, enterprise engineering organizations are being pulled toward a new requirement: governance infrastructure for autonomous systems.

The future AI stack likely includes a dedicated governance layer sitting between agents and production execution. And the teams closest to infrastructure reliability — the ones the MIT/Snowflake research describes — may be the first operators of that layer. Chris Child puts the urgency for senior leaders bluntly:

“If your C-suite still considers data engineering as a support role, you’re already five years behind — and probably training your future competitors.” — Chris Child, Snowflake

The pressure runs the same direction at the technical layer. A discipline that spends 19% of its time on AI in 2023, 37% in 2025, and projects 61% by 2027 is not evolving incrementally — it is being asked to operate a new class of system. That system needs constraints, not just context. It needs verification contracts, not just observability. It needs governance infrastructure, not just better tools.

The next role expansion in data engineering is not more pipelines. It is the operating discipline that keeps autonomous systems inside their architectural intent.

Frequently asked questions

What does Snowflake’s AI data engineering report say?+

Redefining Data Engineering in the Age of AI is an MIT Technology Review Insights report sponsored by Snowflake, based on a June 2025 survey of 400 senior data and technology executives at organizations earning $500M+. Headline findings: time data engineers spend on AI tripled from 19% (2023) to 37% (2025) and is projected to reach 61% by 2027; 20% of organizations have deployed agentic AI and 54% expect to within 12 months; 81% say AI has rewritten the data engineer job description; data governance is the #2 challenge of new AI tools at 40%; data security is the #1 challenge as AI advances at 55%.

Why does AI change enterprise infrastructure assumptions?+

Traditional infrastructure is deterministic, workflow-oriented, and human-triggered. AI infrastructure is probabilistic, autonomous, continuously operating, and capable of generating both artifacts and decisions. The moment systems gain execution authority, governance becomes an infrastructure problem instead of a documentation problem.

What is governance infrastructure for AI?+

Governance infrastructure is the systems layer that enforces architectural constraints, operational boundaries, execution policies, verification rules, and deterministic invariants across autonomous AI systems. It is the runtime-time discipline that sits between agents and production execution — not compliance paperwork.

Why can’t observability solve execution drift on its own?+

Observability explains failures after execution. Governance infrastructure attempts to shape execution before drift occurs. Logs, traces, monitoring, and evals are necessary but downstream — the next enterprise AI challenge is constraining operational behavior before systems drift into invalid states, not understanding autonomous behavior after the fact.

Where does governance infrastructure sit in the enterprise AI stack?+

The emerging six-layer stack: foundation models, retrieval and context systems, agent orchestration frameworks, observability and evaluation, governance infrastructure, and verification and enforcement. Most organizations currently stop at orchestration and observability. Autonomous execution introduces a need for deterministic enforcement and operational verification — that is the layer being added next.