Data engineers should be working faster than ever. AI-powered tools promise to automate pipeline optimization, accelerate data integration and handle the repetitive grunt work that has defined the profession for decades.
Yet, according to a new survey of 400 senior technology executives by MIT Technology Review Insights in partnership with Snowflake, 77% say their data engineering teams’ workloads are getting heavier, not lighter.
The culprit? The very AI tools meant to help are creating a new set of problems.
While 83% of organizations have already deployed AI-based data engineering tools, 45% cite integration complexity as a top challenge. Another 38% are struggling with tool sprawl and fragmentation.
“Many data engineers are using one tool to collect data, one tool to process data and another to run analytics on that data,” Chris Child, VP of product for data engineering at Snowflake, told VentureBeat. “Using several tools along this data lifecycle introduces complexity, risk and increased infrastructure management, which data engineers can’t afford to take on.”
The result is a productivity paradox. AI tools are making individual tasks faster, but the proliferation of disconnected tools is making the overall system more complex to manage. For enterprises racing to deploy AI at scale, this fragmentation represents a critical bottleneck.
From SQL queries to LLM pipelines: The daily workflow shift
The survey found that data engineers spent an average of 19% of their time on AI projects two years ago. Today, that figure has jumped to 37%. Respondents expect it to hit 61% within two years.
But what does that shift actually look like in practice?
Child offered a concrete example. Previously, if the CFO of a company needed to make forecast predictions, they would tap the data engineering team to help build a system that correlates unstructured data like vendor contracts with structured data like revenue numbers into a static dashboard. Connecting these two worlds of different data types was extremely time-consuming and expensive, requiring lawyers to manually read through each document for key contract terms and upload that information into a database.
Today, that same workflow looks radically different.
“Data engineers can use a tool like Snowflake Openflow to seamlessly bring the unstructured PDF contracts living in a source like Box, together with the structured financial figures into a single platform like Snowflake, making the data accessible to LLMs,” Child said. “What used to take hours of manual work is now near instantaneous.”
The shift isn’t just about speed. It’s about the nature of the work itself.
Two years ago, a typical data engineer’s day consisted of tuning clusters, writing SQL transformations and ensuring data readiness for human analysts. Today, that same engineer is more likely to be debugging LLM-powered transformation pipelines and setting up governance rules for AI model workflows.
“Data engineers’ core skill isn’t just coding,” Child said. “It’s orchestrating the data foundation and ensuring trust, context and governance so AI outputs are reliable.”
The tool stack problem: When help becomes hindrance
Here’s where enterprises are getting stuck.
The promise of AI-powered data tools is compelling: automate pipeline optimization, accelerate debugging, streamline integration. But in practice, many organizations are discovering that each new AI tool they add creates its own integration headaches.
The survey data bears this out. While AI has led to improvements in output quantity (74% report increases) and quality (77% report improvements), those gains are being offset by the operational overhead of managing disconnected tools.
“The other problem we’re seeing is that AI tools often make it easy to build a prototype by stitching together several data sources with an out-of-the-box LLM,” Child said. “But then when you want to take that into production, you realize that you don’t have the data accessible and you don’t know what governance you need, so it becomes difficult to roll the tool out to your users.”
For technical decision-makers evaluating their data engineering stack right now, Child offered a clear framework.
“Teams should prioritize AI tools that accelerate productivity, while at the same time eliminate infrastructure and operational complexity,” he said. “This allows engineers to move their focus away from managing the ‘glue work’ of data engineering and closer to business outcomes.”
The agentic AI deployment window: 12 months to get it right
The survey revealed that 54% of organizations plan to deploy agentic AI within the next 12 months. Agentic AI refers to autonomous agents that can make decisions and take actions without human intervention. Another 20% have already begun doing so.
For data engineering teams, agentic AI represents both an enormous opportunity and a significant risk. Done right, autonomous agents can handle repetitive tasks like detecting schema drift or debugging transformation errors. Done wrong, they can corrupt datasets or expose sensitive information.
“Data engineers must prioritize pipeline optimization and monitoring in order to truly deploy agentic AI at scale,” Child said. “It’s a low-risk, high-return starting point that allows agentic AI to safely automate repetitive tasks like detecting schema drift or debugging transformation errors when done correctly.”
But Child was emphatic about the guardrails that must be in place first.
“Before organizations let agents near production data, two safeguards must be in place: strong governance and lineage tracking, and active human oversight,” he said. “Agents must inherit fine-grained permissions and operate within an established governance framework.”
The risks of skipping those steps are real. “Without proper lineage or access governance, an agent could unintentionally corrupt datasets or expose sensitive information,” Child warned.
The perception gap that’s costing enterprises AI success
Perhaps the most striking finding in the survey is a disconnect at the C-suite level.
While 80% of chief data officers and 82% of chief AI officers consider data engineers integral to business success, only 55% of CIOs share that view.
“This shows that the data-forward leaders are seeing data engineering’s strategic value, but we need to do more work to help the rest of the C-suite recognize that investing in a unified, scalable data foundation and the people helping drive this is an investment in AI success, not just IT operations,” Child said.
That perception gap has real consequences.
Data engineers in the surveyed organizations are already influential in decisions about AI use-case feasibility (53% of respondents) and business units’ use of AI models (56%). But if CIOs don’t recognize data engineers as strategic partners, they’re unlikely to give those teams the resources, authority or seat at the table they need to prevent the kinds of tool sprawl and integration problems the survey identified.
The gap appears to correlate with visibility. Chief data officers and chief AI officers work directly with data engineering teams daily and understand the complexity of what they’re managing. CIOs, focused more broadly on infrastructure and operations, may not see the strategic architecture work that data engineers are increasingly doing.
This disconnect also shows up in how different executives rate the challenges facing data engineering teams. Chief AI officers are significantly more likely than CIOs to agree that data engineers’ workloads are becoming increasingly heavy (93% vs. 75%). They’re also more likely to recognize data engineers’ influence on overall AI strategy.
What data engineers need to learn now
The survey identified three critical skills data engineers need to develop: AI expertise, business acumen and communication abilities.
For an enterprise with a 20-person data engineering team, that presents a practical challenge. Do you hire for these skills, train existing engineers or restructure the team? Child’s answer suggested the priority should be business understanding.
“The most important skill right now is for data engineers to understand what is critical to their end business users and prioritize how they can make those questions easier and faster to answer,” he said.
The lesson for enterprises: Business context matters more than adding technical certifications. Child stressed that understanding the business impact of ‘why’ data engineers are performing certain tasks will allow them to anticipate the needs of customers better, delivering value more immediately to the business.
“The organizations with data engineering teams that prioritize this business understanding will set themselves apart from competition.”
For enterprises looking to lead in AI, the solution to the data engineering productivity crisis isn’t more AI tools. The organizations that will move fastest are consolidating their tool stacks now, deploying governance infrastructure before agents go into production and elevating data engineers from support staff to strategic architects.
The window is narrow. With 54% planning agentic AI deployment within 12 months and data engineers expected to spend 61% of their time on AI projects within two years, teams that haven’t addressed tool sprawl and governance gaps will find their AI initiatives stuck in permanent pilot mode.
