Enterprises are adopting agentic AI systems to transform fragmented data lakes into cohesive, real-time research tools.
Saurabh Shrivastava of AWS joined CIO News to discuss his modular, secure, and purpose-built approach to successful AI integration.
Shrivastava noted a shift from monolithic data lakes to decentralized data to reduce single points of failure.
For decades, enterprises have amassed vast digital reservoirs in data lakes, swamps, and siloed repositories that promised insight, but often delivered fragmentation. In the pursuit of real-time, cross-functional research, businesses are now turning to agentic AI systems as a lifeline. But the promise of intelligent, orchestrated agents risks becoming another layer of technical debt if not built on the right foundation.
To make the leap from a promising prototype to a production-ready system, leaders must fundamentally rethink their data architecture, security, and model strategy from the ground up. CIO News spoke with Saurabh Shrivastava, the Global Head of Solutions Architecture for Agentic AI & Legacy Transformation at AWS. Shrivastava is the author of several books, including Generative AI for Software Developers. He argued that success in the agentic era requires a disciplined, architectural approach built on three core pillars: modular intelligence, secure interoperability, and purpose-built design.
The first step in Shrivastava's blueprint involves reimagining the traditional data pipeline itself with agentic frameworks in mind.
From ETL to orchestration: "When we talk about agentic AI, we need to differentiate each part of the process when it comes to data. We have data ingestion agents, data cleaning agents, and what used to be the ETL pipeline– which has evolved into the orchestrator agent that dictates what the other agents do."
Shrivastava warned that when enterprises start doing too much too soon, they risk creating a new form of technical debt in the form of agent sprawl. To prevent an orchestration nightmare of thousands of tiny, unmanageable agents, Shrivastava advised adopting a familiar engineering discipline rooted in sizing the appropriate scope for the task at hand.
Avoiding agent sprawl: "In the same way engineers design microservices, each agent should offer a logical service that is responsible for a full functional workflow," he said. "You don't want an agent to be too granular or too broad. You need to define the right agent."
A viable modular agentic approach begins with dismantling the concept of a monolithic data lake that ties systems to "legacy architecture" and exposes single points of failure. The future, he said, belongs to the data mesh, a decentralized system that solves classic organizational knowledge problems.
Knowledge accrual through contextual awareness: "I'm a data engineer, not an expert in every field. When it comes to working with complex industries like life sciences, I've historically just played with the numbers," he said. But historical disconnects between data processing and domain expertise are reduced in agentic systems, where an agent can fuse both by referencing a knowledge base and real-time web tools to enrich data with contextual awareness.
But Shrivastava admitted achieving a right-sized, purpose-built system is a phased journey. It starts with simple prompt engineering, progresses to enriching models with data via Retrieval-Augmented Generation (RAG), and finally matures to model fine-tuning for hyper-specialization.
The catalyst for the final step is part financial, and part outcomes-based. While fine-tuning a large language model is believed by some to be prohibitively expensive due to mainstream coverage of foundational model training costs incurred by giants like OpenAI, Shrivastava clarified that modern techniques make it far more accessible.
Accessible fine-tuning: "With fine-tuning, we are not asking to fine-tune a multi-billion-parameter model," he said. "With the LoRA or QLoRA methods, you just take a layer of the parameters and fine-tune those, and suddenly you will see it is responding very well even with the minimalistic prompt you are giving it."
The conversation concluded with Shrivastava looking beyond the immediate horizon of new applications, focusing on one of AI's most powerful, if less glamorous, applications: dismantling decades of technical debt from workloads stuck in legacy technologies like COBOL or Assembly. "GenAI is not just about building new applications. We are also using it to accelerate innovation in older processes," he said. "I see a lot of opportunity in unlocking old workloads to move them into the new modern architecture to renew their full benefits."