What’s the best way to build a real-time voice agent using Azure?

0
2
Asked By CuriousCoder77 On

I'm currently developing a voice agent and I want to ensure that I take the right approach before potentially over-engineering the solution. My objective is to create an agent that can handle inbound and outbound phone calls, engaging in natural conversations in English, Arabic, and Spanish. I aim to utilize Azure Neural TTS to provide realistic voice output. During conversations, the agent needs to gather essential details like the patient's name, appointment date, and reason for the visit, confirm the booking, and then save all this information in Cosmos DB.

At this point, I'm considering using Azure Communication Services or Twilio for handling telephony, Azure Speech Services for converting speech to text and vice versa, and Azure OpenAI (GPT-4/4o-mini) for conversational intelligence and extracting key information. I'll also use Cosmos DB for session management and Azure Functions for backend orchestration. Any tips, experiences, or references to similar projects would be greatly appreciated! Thanks!

1 Answer

Answered By TechExplorer99 On

I think you should consider looking into real-time voice APIs like gpt-realtime. They are built for lower latency and might offer what you need without the extra complexity. It runs on a 4o model but is optimized for real-time responses, which could enhance user experience significantly.

VoiceWizard42 -

I'm interested in that too! But how would it fit into the Azure setup I'm planning?

CuriousCoder77 -

Definitely looking into it, thanks!

Related Questions

LEAVE A REPLY

Please enter your comment!
Please enter your name here

This site uses Akismet to reduce spam. Learn how your comment data is processed.