"Signing off, getting money upfront to actually go ahead and build these things, is generally fairly straightforward and fairly easy. But it puts massive pressure on existing teams within organizations to actually implement it."
Daniel Murphy
Head of SRE
PwC UK

The budget is not the problem. Banks, insurers, and government agencies are willing to spend millions on AI because the productivity gains are easy to quantify. The selling is almost automatic. What is not automatic is selling the risk that comes with building data pipelines that never existed before, accessing data in ways the organization has never allowed, and relying on engineering teams that have certifications but not production experience deploying AI under regulatory constraints.

Daniel Murphy, Head of SRE at PwC UK, leads the site reliability engineering practice across enterprise clients, including major banks, building societies, insurance companies, and government agencies. His work spans SRE strategy, observability, automation, incident management, and system reliability for 24/7 operations. Before PwC, he held SRE and infrastructure leadership roles at Deltek, ESO, Cygilant, and Almac Group.

"Signing off, getting money upfront to actually go ahead and build these things, is generally fairly straightforward and fairly easy. But it puts massive pressure on existing teams within organizations to actually implement it," said Murphy.

The expertise gap is real

Murphy drew a sharp line between AI training and AI experience. "A lot of people are saying they are AI experts because they've done all the courses and got all the degrees. But experts in this day and age are hard to come by because not many people have actually built and implemented AI in a regulated environment."

The field changes weekly. New models, new capabilities, and new platform-specific tooling mean that expertise is context-specific and temporary. "Anybody who claims they're an AI expert, they're an expert today. They won't be tomorrow."

He described a pattern where governance boards sign off on AI initiatives, but the actual risk management falls to engineering teams that have never been trained to think about it. "Risk is not managed by the governance board on a day-to-day basis. It's essentially managed by the engineering team because they're the ones actually building these things and putting in the controls.

And they have never been trained for that." In traditional builds, architecture reviews created a blanket of approved patterns. AI projects break that model because teams face thousands of questions about what they can and cannot do with data, models, and access. That uncertainty slows development significantly.

Shift left or clean up later

Murphy advocated for embedding SRE into AI projects from the start, applying the same shift-left thinking that security teams adopted years ago. "If you are building a tool that you want to bring into production, bring in an SRE engineer early on. Even at the architectural phase or even just talking out loud about what you want."

When SRE input comes early, production-readiness requirements become core design constraints rather than afterthoughts. Monitoring, observability, and resilience get built in. By the time the system reaches production, it is in a fundamentally better state.

Without that early involvement, SRE teams end up as the cleanup crew. "SREs are the Spartans. There are not many of them, but they can be called in to do the most ridiculous technical things in your organization. If they end up having to do tasks that should never happen in the first place, it creates a lot of frustration."

The SRE culture is blameless by design. When something breaks, the failure sits with the environment that made the mistake possible, not the individual engineer. "If you want to be an SRE manager or director, ultimately, you're responsible for your teams and what they do."

AI saves time but adds engineering load

The staffing math is consistently misunderstood. AI reduces manual work at the lower levels but adds significant engineering complexity at the infrastructure level. "You're adding in hilariously complicated systems, especially if you're building them from scratch. Engineering teams need extra provisions. You're going to reduce headcount in one area, but you actually need to increase your engineering staff, and that's generally not understood."

Upskilling compounds the problem. AI certifications across Azure, AWS, and GCP now change on roughly a six-month cycle as the underlying platforms evolve. Murphy recommended building internal communities of practice where curious engineers share new capabilities organically, supplemented by third-party training partnerships that commit to regular updates. "No longer are you buying software or committing to a SaaS product. I see it being a lot more of a partnership model going forward, where both teams are fully embedded with each other."

Murphy framed AI adoption as a compressed transformation on the scale of a cloud migration but squeezed into months instead of years, with new operational demands arriving every six months. "The gravitational shift from non-AI to an AI-enabled workforce is far more beneficial than I think people realize. But experts are only experts today. There will be something new tomorrow. AI is the absolute wild west when it comes to innovation."