The healthcare industry at large is banking on intelligent automation to tame spiraling costs and inefficiencies of administrative work. Successful implementation will free up providers and streamline the labyrinth of scheduling, billing, and compliance processing. But a troubling challenge has emerged. Recent benchmarks suggest that large language models often perform worst on these very administrative and workflow tasks.

This disconnect raises a critical question: Is the technology subpar, or is the industry’s framework for implementation simply outdated?

For an answer, we spoke with the former Chief Innovation Officer at Penn Medicine's Center for Health Care Transformation and Innovation, and current Board Partner of First Round Capital Roy Rosin. He has seen this pattern before, and is confident the issue isn't a failure of AI, but a lapse in imagination rooted in long-standing industry habits and a lack of vendor alignment.

  • A history of suboptimal thinking: "Healthcare is littered with a history of suboptimal thinking about technology where people think they're done when they've implemented and deployed," said Rosin. True technological maturity, he argued, has evolved from being "done when implemented," to being "done when you achieve an outcome." But in the AI era, even that isn't enough. The new reality requires a state of continuous monitoring, learning, and refinement at enormous scale.

The key is proper orchestration within tightly tailored and governed systems. According to Rosin, the highest rates of accuracy are made possible by an "interplay of systems," including agentic state machines and RAG models designed to eliminate hallucinations and ensure outputs are correctly sourced. But trust is maintained through relentless governance. This means running hundreds of thousands of test cases and even deploying agents to stress-test the system to ensure the technology performs exactly as promised.

  • Infinite edge cases: That process of refinement becomes exponentially harder when dealing with the sheer complexity of real-world healthcare. Rosin pointed to First Round portfolio company Assort Health, which had to solve for an astonishing 900,000 edge cases to achieve high accuracy in its patient scheduling and communication automations. That colossal number wasn't pulled from a public dataset; it was painstakingly compiled by going practice to practice, specialty by specialty—from dermatology to neurology to gastroenterology—each with its own maze of unique workflows.

  • Building moats through rigor: The key, Rosin said, is old-fashioned rigor. "You uncover the edge cases by sitting shoulder to shoulder with the experts. You have to work through not only the way things are supposed to go, but also what happened when things didn't work the way they were supposed to." The externalities include massive proprietary data assets and a powerful first-mover advantage, turning technical challenges into competitive moats.