TADSummit Online Conference, 9 April, Voice AI Update, Rob Pickering & Lyle Pratt

With my two favorite voice AI expert practitioners, a discussion on the state of the art in voice AI:

After all the marcom from a bunch of conferences over the past few months, this is the dose of salts you need to clear away all the BS congestion blues.

Voice AI continues to move at a rapid pace, enabling small technology-led organizations to move fast, break things, and deliver innovations first. I’m sure you’ve seen how Agentic AI means less and less. It’s just another fashionable marcom term, often lumped together with gen AI.

Rob and Lyle both possess extensive voice / voice AI experience with deployed services. And as engineers they play with the new tools enabling valuable insights.

My first question was on what’s getting them excited in Voice AI. Rob opened with 2 things:

  • Speech to speech models. Ultravox, OpenAI, and Google Gemini. Rob’s favorite is Ultravox. They enable the intonation of speech to be interpreted.
  • Next is Model Context Protocol (MCP) which has vastly expanded the tools available. There is a broader convergence in AI, which will in time expand its impact.

Lyle backed up Rob’s view. MCP caught many by surprise, and still needs development, particularly with OAuth. We’ll hear later about some of the work-arounds, which enables smaller companies to solve gaps faster. He finds the broader shift in awareness across the industry exciting, from watching to genuine interested in doing something with Voice AI. That is, $$$ are now being spent on. Though most of the large vendors have not yet launched product.

This is going to be an interesting dynamic on the ‘make versus buy’ decisions through 2025 and 2026. With the added complexity of continued rapid changes in the technologies.

Lyle has been extensively testing the different models and currently finds the speech to speech models dumber than the ‘old school’ speech to text models. However, it’s clear Rob is a fan of Ultravox, and the number of tools exposed has a significant impact on performance (latency). With 10/12 tools clearly exposing the challenges.

On pricing, speech to speech is still high with OpenAI. Lyle brought up an important point that OpenAI is a true speech to speech model, while some of the others, ElevenLabs and Deepgram are more traditional, it’s just bundling of functions, labelling, and pipeline management. But a common refrain is: it’s just a matter of time.

On Deepseek, a reasoning model, competition is great. But deep research on a call is not what customers are asking for. Though it is yet another example of the trend, the cost of intelligence will tend to zero.

“the cost of intelligence will tend to zero.” Lyle Pratt

Many of the 350 million IVR and the billions of voicemail boxes will tend to AI agents. Those AI agents can simply register as a VoIP device. Within the existing deployments, the addition of AI agents can be rapid, almost overnight. For the customers and use cases where it matters, in addition to all the existing customers and traditional use cases. No forklift upgrades to access new capabilities.

MSPs (Managed Service Providers) are actively looking for AI agents as they are seeing customer demand. Without customer demand they focus on what traditionally sells. Lyle points out, this is not a wholesale change, simply an addition for addressable customers / use cases.

Rob points out its really LLMs that matter, as the focus is using it between “internal” systems and customers to give them the answers they seek. Often what offshore call centers are used for, and a hot spot use case for initial voice AI applications. But also expanding such support to small and medium enterprises.

Importantly the agency of exposure is often no more than given to customers. Simply, external function calling, but now that is being expanded, enabling the LLM to transact business. And where resolution is not possible handing over to a person with agency to solve the problem.

Automation to enable skilled employees to solve harder problems, and helping most customers get to an answer faster. That’s worth perhaps a few weeks of integration time. Rob gives the example of a restaurant chain with automation up to large group bookings, where accommodations can be complex, and the potential revenue loss of such a booking deserves human intervention.

Here Rob does see increasing sophistication, and perhaps the start of vertical integrations. With the OpenAI model, often for the telephony bit, developers are directed to companies like Twilio. But as Lyle pointed out on accessibility this could be something as simple as registering an agent as a VoIP device on a CPaaS / UCaaS platform. There are many ways the telephony part can be solved.

On Llama, Lyle mentioned it is a big model, hence kinda slow. Compared to Gemini, a little slower. But having such a large open source model is a boon for the whole industry. Its is Meta’s gift to humanity. Rob likes Llama, Ultravox is based on Llama, a voice tokenizer in front of Llama. Enables amazing cost efficiency, and can run on your own hardware. Rob is definitely a fan.

Rob highlights if you’re dependent on the prompting to a particular model, more work is required on the implementation. To be running on open source, and on your own hardware delivers significant cost benefits. If you’re building a service and all the value sits in OpenAI or Google, what’s the value you’re building?

I then asked about Model Context Protocol (MCP) and security. It’s really a choice of what data the model is allowed access to. OAuth is not baked into MCP. So its implementation dependent, part of the middleware. Which Rob highlighted since the beginning of his AI presentations. Check out his “LLMs on the telephone: useful tool, or hallucinating danger to humanity?” We’ve come so far so fast.

Lyle is integrating MCP as well as their own integration platform for customers wanting to implement they own integrations with agents. He also brought up the importance of the middleware, to simplify tool calling operations. Many AI agent demos do not include tool call, and use an optimized agent for the best possible experience.

Today there is a trade off between latency and intelligence. Many human agents have access to 10 or 12 tools, and MCP makes it easier to add tools. However, the more tools the more latency. For Lyle he needs faster and smarter LLMs, not more reasoning models.

We then moved onto business case. This has been lacking on many of the presentations. However, together, Rob and Lyle helped me understand the the issues, but also some of the basic numbers.

Rob starts with the mass of transactions coming at a website or call center to filter away the easy stuff to AI agents, so people can focus on the hard questions. The trend is using AI agents for as close to 100% of incoming questions. Then filter the high value leads to a human. For the mass of incoming contact requests, its generally cheaper using an agent than a person. For contacts from the website, that can be done via WebRTC, which for telcos will impact their revenues without a WebRTC product.

I tried to push Rob to a number on the savings, but he’s not ready to go there. The technology is developing so fast, we’re not at the point of clear metrics. Lyle backs up the difficulties Rob raises. And uses his CRM customer for a quantified example. SME use an offshore call center, and perhaps spends $2500 per month, an AI agent solution could cost $500-700. In my experience, that 5X figure is necessary to move the needle for action.

Lyle raises an importance point on the incentives of the solution provider, e.g. MSP. Their experience has been constant margin and revenue compression. With an AI agent they could make 15c per minute. This could give MSPs revenues and margins they’ve not seem for a decade or two. Yes, it was spent with the people in the call center. Incentives are the key, and we’re still at the very early stages of this market.

Johnny made an request, he wants your merc: hats, hoodies. t-shirts, etc. Send them to Johnny, so he can stop making his own Merch, and wear your’s on the podcast.

One thought on “TADSummit Online Conference, 9 April, Voice AI Update, Rob Pickering & Lyle Pratt”

Leave a Reply

Your email address will not be published. Required fields are marked *