Why are we doing this?
Voice is a natural way to interact with AI. By adding real-time voice to gpt-rag, we make retrieval-augmented assistants more engaging, accessible, and useful in scenarios like meetings, customer support, and live collaboration where hands-free or multilingual interaction is essential.
What does it do?
-
Voice-enabled RAG – Adds “speech in, speech out” to gpt-rag, letting users query enterprise knowledge by voice and receive spoken, retrieval-grounded responses.
-
Phone Integration – Lets user call a phone number and interact with the assistant or assistant doing outbound calls.
-
Realtime reasoning – Uses the Azure OpenAI GPT Realtime API for low-latency transcription, retrieval, and response synthesis over enterprise data sources.
-
Use cases – Meeting assistants, customer service bots, live Q&A in Teams, and multilingual knowledge agents.
-
Nice to have: Teams integration – Lets VoiceRAG join Microsoft Teams calls, capture live audio queries, and provide contextual answers in real time.
Technical Guidelines
High Level Solution Architecture
References
Other
Why are we doing this?
Voice is a natural way to interact with AI. By adding real-time voice to gpt-rag, we make retrieval-augmented assistants more engaging, accessible, and useful in scenarios like meetings, customer support, and live collaboration where hands-free or multilingual interaction is essential.
What does it do?
Voice-enabled RAG – Adds “speech in, speech out” to gpt-rag, letting users query enterprise knowledge by voice and receive spoken, retrieval-grounded responses.
Phone Integration – Lets user call a phone number and interact with the assistant or assistant doing outbound calls.
Realtime reasoning – Uses the Azure OpenAI GPT Realtime API for low-latency transcription, retrieval, and response synthesis over enterprise data sources.
Use cases – Meeting assistants, customer service bots, live Q&A in Teams, and multilingual knowledge agents.
Nice to have: Teams integration – Lets VoiceRAG join Microsoft Teams calls, capture live audio queries, and provide contextual answers in real time.
Technical Guidelines
High Level Solution Architecture
References
Other