The report — Redefining Data Engineering in the Age of AI, sponsored by Snowflake and editorially independent of it — surveyed 400 CIOs, CTOs, CDOs, CAIOs, and equivalent execs across 7 industries and 10 countries. The headline numbers are striking on their own:
- Time spent on AI is tripling. Data engineers spent an average of 19% of their workday on AI projects in 2023, 37% in 2025, and respondents expect 61% by 2027.
- Agentic AI is about to become majority-deployed. 20% of organizations have already started; 54% expect to begin within 12 months.
- The job description has changed radically. 81% of executives say AI has rewritten what data engineers do. 77% say workloads are growing.
- Data governance is now the #2 challenge of new AI tools (40%), behind only integration complexity (45%). The biggest companies rate it their greatest challenge.
- Data security and privacy is rated the single greatest challenge as AI capabilities advance (55%).
Those findings are useful. They also point at something the report does not name directly.
As AI systems move closer to production infrastructure, data engineering starts evolving into governance engineering.
The clearest articulation of that shift in the report comes from Snowflake’s own VP of Product for Data Engineering, Chris Child:
“Over time, the data engineer role will shift from writing code for all pipelines toward managing the infrastructure that these are running in, orchestrating across a lot of these, and setting the rules and tests to make sure the right data is coming in.” — Chris Child, VP Product, Data Engineering, Snowflake
That is not pipeline work. That is governance work, named in pipeline-friendly language.
AI is turning infrastructure into execution systems
The reason this matters is that AI changes the assumptions enterprise infrastructure was designed around.
| Traditional infrastructure | AI infrastructure |
|---|---|
| Deterministic | Probabilistic |
| Workflow-oriented | Autonomous |
| Human-triggered | Continuously operating |
| Moves data and code | Generates artifacts and decisions |
The shift is not incremental. A pipeline that moves bytes is governed differently than a system that decides what bytes to move, what to change, and what to ship. The report’s own framing of agentic AI captures this directly:
“Agentic AI will give us systems that not only research, analyze, and plan, but that act on plans in a dynamic and agile way.” — Ritu Jyoti, formerly GM of AI, Automation, Data & Analytics, IDC
The moment systems gain that execution authority, governance becomes an infrastructure problem instead of a documentation problem.
Why agentic AI changes the role of data engineering
Snowflake’s population is well-positioned to see this first. Data engineering teams already own:
- Orchestration
- Retrieval and context systems
- Workflow execution
- Operational lineage
- Policy surfaces
- AI infrastructure reliability
Add agentic AI on top of that surface area and the responsibility expands beyond “the pipeline ran successfully.” The new questions are: did the agent stay inside its scope, did it respect the constraints, did it modify systems it was not authorized to touch, can we prove what it did and why.
That is governance — not as compliance paperwork, but as runtime operational control. And the survey shows data engineering teams already see this. When asked which benefits agentic AI brings their teams, respondents named pipeline debugging and optimization (42%), data integration (38%), orchestration across teams (34%), and data governance and compliance (33%). Governance is sitting in the top four expected benefits — right next to the pipeline-engineering tasks that have always defined the discipline.
AI systems need constraints, not just context
The instinctive response to autonomous behavior is to give the model more context. Bigger windows, richer retrieval, better embeddings. All of it helps. None of it is sufficient.
Context tells the model more. Constraints tell the system what is allowed. As autonomy scales, the gap between “the model knows” and “the rule is enforced” becomes the dominant failure mode.
This is the same line that shows up at the runtime layer: retrieval surfaces information; it does not enforce constraints. Snowflake’s data is one more data point in the same picture.
The emerging governance infrastructure layer
Enterprise AI stacks are quietly adding a new layer between agents and production systems: governance infrastructure.
Concretely, it is the systems layer that enforces:
- Architectural constraints — what the system is allowed to be
- Operational boundaries — what the agent is allowed to do
- Execution policies — how and where actions can run
- Verification rules — what must be true before a change is accepted
- Deterministic invariants — same input, same state, same verdict
The primitives that compose it are already named in the Mneme ontology: governance propagation, verification contracts, runtime enforcement, architectural drift, operational consistency. The Snowflake/MIT survey is one of the clearer enterprise-facing signals that this is the layer being added next.
The numbers point in the same direction. Data governance is the second-highest-rated challenge of new AI tools at 40%, only behind integration complexity at 45%. Tool sprawl and fragmentation comes third at 38% — which is exactly the problem governance propagation solves. The biggest companies in the survey rate data governance their single greatest challenge. And the stakes are not theoretical:
“The best case scenario is that a breach results in some embarrassment. The worst case is that your business is forced to shut down.” — Dave Masino, Senior Director of Data and Intelligence, Slalom
Observability cannot solve execution drift alone
Most enterprise AI tooling today focuses on logs, traces, monitoring, evals, observability. All of it is necessary. None of it is the layer being discussed here.
Observability explains failures after execution. Governance infrastructure shapes execution before drift occurs.
The next enterprise AI challenge is not understanding autonomous behavior after the fact. It is constraining operational behavior before systems drift into invalid states.
That is the difference between forensics and infrastructure — and it is the same distinction that governance-before-generation names at the per-agent level.
The new enterprise AI infrastructure stack
The shape the market is converging toward is six layers, not four:
Most organizations currently stop at orchestration and observability. That is enough to ship a pilot. It is not enough to operate autonomous systems in production at scale.
The layers being added next are not optional. Layer 5 is what makes autonomy operationally sustainable. Layer 6 is what makes it auditable. Together they are what turn agentic AI from a demo into infrastructure.
Conclusion: from data engineering to governance engineering
Snowflake’s report frames this transition as an evolution in data engineering. The broader shift is infrastructural. As AI systems gain operational authority, enterprise engineering organizations are being pulled toward a new requirement: governance infrastructure for autonomous systems.
The future AI stack likely includes a dedicated governance layer sitting between agents and production execution. And the teams closest to infrastructure reliability — the ones the MIT/Snowflake research describes — may be the first operators of that layer. Chris Child puts the urgency for senior leaders bluntly:
“If your C-suite still considers data engineering as a support role, you’re already five years behind — and probably training your future competitors.” — Chris Child, Snowflake
The pressure runs the same direction at the technical layer. A discipline that spends 19% of its time on AI in 2023, 37% in 2025, and projects 61% by 2027 is not evolving incrementally — it is being asked to operate a new class of system. That system needs constraints, not just context. It needs verification contracts, not just observability. It needs governance infrastructure, not just better tools.
The next role expansion in data engineering is not more pipelines. It is the operating discipline that keeps autonomous systems inside their architectural intent.