The future of AI is undeniably agentic, but the bottlenecks to widespread adoption are still being sorted out on the infrastructure level. The physics of generative AI are a function of "tokens and time", with the bulk of the technical lift falling on the foundational model providers and inference infrastructure that deliver them to application-layer tools for use. As the underlying technology matures, larger questions arise about the future of business models that hedge against pricing commoditization, a possibly open source future, and perhaps more importantly, the energy consumption required to power increasing demand for the best "logic".
To understand the path forward, we spoke with Richard Ling, a commercial Go-to-Market leader at Groq. With a background in the energy sector and multi-time founder, he brings a unique, systems-level perspective to the AI market.
Beyond benchmarks: "Agentic AI is clearly the future, and I'm excited how cheaper and faster compute will enable agents to solve novel problems beyond our current imagination," said Ling. Flashy product demos often show aesthetically-enticing use cases for agents, but often lack real world impact. Ling envisioned a future where agents innovate by debating solutions, where they "actually fight just like humans would do" to determine the best answer, and thus lead to new discoveries beyond the limit of human thought.
The three stages: This capability will likely unfold in three stages: moving from today's automation of "boring schlep work," to augmenting "mid-level intelligence" for knowledge workers, and finally to performing the high-level intelligence of bleeding-edge research.
Before that future arrives, the market must navigate today’s realities. Ling offered a clear-eyed view from the front lines. "What I'm seeing on the ground is that quality is by far still the most important thing," he said. "Model intelligence is paramount, followed by speed of token output, followed by cost as a third consideration—in that order." This is the framework behind Groq’s strategy. On speed and cost, Groq's advantage is clear with their systems deliver "300 to 1300 tokens per second" (6-10x faster than OpenAI) and are "5 to 20x cheaper."
But the price of inference increases with the complexity of reasoning, and immense scope of hundreds, thousands, and even millions of agents working in parallel.
2G vs. 5G: "The token costs balloon, and in tandem, the time to compute also increases," he explained. The result is bottlenecking at the infrastructure level due to the narrowness of the pipes required to both train and deliver outputs. "It's like back in the 2G era when things were very slow. In the future it might become like 5G where everything is a lot more instantaneous."
Solving bottlenecks isn't just about faster chatbots. The supply must meet demand where it lives. Groq services governments, large enterprises, startups, solo devs, and everything in between. Each customer persona has different needs, but each leverages relatively similar model infrastructure to serve them.
A wide market: "We're seeing really good uptick in the emerging tech space (e.g. YC, Sequoia-backed companies). On the other side, we're seeing adoption at the enterprise level with Fortune 500s, as well as literal nation-states building their own inference ecosystem," said Ling.
Availability, speed, and pricing create a natural "tug of war" between open and closed-source models, and Groq is poised to benefit from both. He pointed to the recent release of OpenAI's gpt-oss open-source models as a tangible "watershed moment" proving this bet is paying off, delivering what he described as "essentially o4-mini level intelligence" at a fraction of the cost and multiples of the speed. This is all enabled by their proprietary LPU chip, purpose-built for inference. "We're trying to do the same thing Nvidia did for the training market for the inference market, which we believe is a much larger pie," Ling said.
This industrial-scale ambition is rooted in Ling’s time in the energy sector. In a recent LinkedIn post, he declared that "tokens are the new electricity." His broader vision involves AI companies building data centers to produce tokens as the modern equivalent of utilities building power plants to produce electrons. It reframes the AI race around the fourth, often-overlooked lever of reliability and a massive "industrial logistical issue."
Tokens as electrons: But unlike the energy market where "an electron is an electron," today’s AI market is "still a race of intelligence" because not all tokens are created equal. This is why Groq is betting on hosting all the leading open-source models. "We're not riding any one wave; we're riding the whole ocean," he said, offering developers a safe harbor from model lock-in.
Electricity, pricing, and raw technology aside, the final, unresolved frontier is not technical but legal: intellectual property. Ling proposed a "middleground" solution that moves beyond the polarized debate between pure technologists and creator-rights advocates. The key, he argued, is to judge infringement based not on the training input, but on the final output. He gave a clear example: if an AI trains on a copyrighted novel but writes a completely different novel, is that infringement? His answer depends on whether the output is "substantively different" from the source material. Said differently, if a human reads a copyrighted novel yet publishes a completely different novel, would that be copyright infringement or not? The unequivocal answer is no.
It’s a complex question, but Ling left the conversation with a "silver lining" that could sidestep the debate entirely: the rise of synthetic data. As frontier models learn to train themselves on their own net-new data, they may no longer need the internet's copyrighted corpus. This, he said, is a "whole new paradigm that could either nullify this issue altogether." Generating hoards of synthetic data requires immense compute capacity, of course, thus bringing the conversation back full circle to inference and where Groq best shines.
*All opinions expressed in this piece are those of Richard Ling, and do not necessarily reflect the views of his employer.