-
Notifications
You must be signed in to change notification settings - Fork 0
Voice Module
gitpavleenbali edited this page Feb 17, 2026
·
2 revisions
The Voice module enables real-time voice interactions with AI agents, including speech-to-text, text-to-speech, and streaming audio processing.
from pyai.voice import VoiceSession, Transcriber, Synthesizer
from pyai.voice.stream import AudioStream, AudioChunk| Component | Description |
|---|---|
| VoiceSession | Manages voice conversation sessions |
| Transcription | Speech-to-text conversion |
| Synthesis | Text-to-speech generation |
| AudioStream | Real-time audio streaming |
from pyai.voice import VoiceSession
# Create voice session
session = VoiceSession(
model="gpt-4o-realtime",
voice="alloy"
)
# Start conversation
async with session.connect() as voice:
# Send audio
await voice.send_audio(audio_data)
# Receive response
async for chunk in voice.receive():
play_audio(chunk)from pyai.voice import Transcriber
transcriber = Transcriber(model="whisper-1")
# Transcribe audio file
result = transcriber.transcribe("recording.wav")
print(result.text)
# Transcribe with timestamps
result = transcriber.transcribe(
"meeting.mp3",
timestamps=True,
language="en"
)from pyai.voice import Synthesizer
synth = Synthesizer(voice="nova")
# Generate speech
audio = synth.speak("Hello, how can I help you today?")
audio.save("greeting.mp3")
# Stream speech
for chunk in synth.stream("This is a longer message..."):
play_audio(chunk)Supported formats:
- PCM16: Raw PCM audio (16-bit)
- WAV: Waveform Audio
- MP3: MPEG Audio Layer III
- OGG: Ogg Vorbis
from pyai.voice.stream import AudioFormat
chunk = AudioChunk(
data=audio_bytes,
format=AudioFormat.PCM16,
sample_rate=24000
)Available voices:
-
alloy- Neutral, balanced -
echo- Warm, conversational -
fable- Expressive, narrative -
onyx- Deep, authoritative -
nova- Friendly, upbeat -
shimmer- Clear, professional
from pyai.voice import VoiceSession
async def voice_assistant():
session = VoiceSession()
async with session.connect() as voice:
# Enable turn detection
voice.enable_turn_detection(
threshold=0.5,
silence_duration=0.8
)
# Continuous conversation
while True:
# User speaks
user_audio = await voice.listen()
# Agent responds
await voice.respond(user_audio)from pyai import Agent
from pyai.voice import VoiceSession
agent = Agent(
name="VoiceAssistant",
instructions="You are a helpful voice assistant."
)
# Attach voice capabilities
session = VoiceSession(agent=agent)- VoiceSession - Session management
- Transcription - Speech-to-text
- Synthesis - Text-to-speech
Intelligence, Embedded.