A powerful AI voice agent platform that combines the intelligence of AI agents LLM with real-time voice communication capabilities. Build sophisticated voice-enabled applications for customer service, sales automation, and interactive assistants.
- Advanced AI Voice Agent: Powered by Google's Gemini Live model
- Real-Time Voice Communication: Seamless audio processing with VideoSDK
- Customizable Personality: Configure voice, tone, and behavior
- AI Outbound Calling: Perfect for automated sales and support calls
- AI Cold Calling: Intelligent conversation starters and follow-ups
- AI Voice Creator: Generate voices from samples with various personalities
- AI Voicemail Generator Free: Automated voicemail creation and responses
- AI Executive Assistant: Handle scheduling, reminders, and administrative tasks
- Goal Based Agent: Configure agents with specific objectives and KPIs
- News Reporter Text to Speech: Professional broadcasting voice capabilities
- Text to Speech Old Man: Various voice profiles including elderly personas
- Frontend: React + TypeScript + Vite
- Backend: Python FastAPI
- AI Engine: Google Gemini Live
- Real-Time Communication: VideoSDK
- Voice Processing: Gemini Realtime API
- Python 3.12 or higher
- Node.js 18+ and npm/yarn
- Google API Key (Gemini)
- VideoSDK Token
git clone https://github.com/videosdk-community/ai-agent-demo
cd ai-agent-demo
- AI Sales Team: Automate lead qualification and follow-up calls
- Customer Support: 24/7 intelligent voice assistance
- Appointment Scheduling: Voice-enabled booking systems
- Market Research: Automated survey and feedback collection
- Popular AI Voices: Multiple voice options for different scenarios
- Interactive Voice Response: Smart call routing and handling
- Voice-Enabled Chatbots: Seamless text-to-voice conversation
This platform bridges the gap between traditional AI assistants and autonomous AI agents, offering:
- Proactive conversation initiation
- Context-aware responses
- Goal-oriented interactions
- Adaptive personality traits
# Create virtual environment (requires Python 3.12+)
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install dependencies
pip install -r requirements.txt
# Copy environment template and configure
cp .env.example .env
# Edit .env file and add your credentials:
# GOOGLE_API_KEY="your-google-api-key-here"
# PORT="8000"
# Start the server
python server.py
# Navigate to client directory
cd client
# Install dependencies
npm install
# or
yarn install
# Copy environment template and configure
cp .env.example .env
# Edit .env file and add:
# VITE_VIDEOSDK_TOKEN="your-videosdk-token-from-app.videosdk.live"
# VITE_API_URL="http://localhost:8000" # or your server URL
# If testing with local server, use ngrok:
# ngrok http 8000
# Then use the ngrok URL in VITE_API_URL
# Start the client
npm run dev
# or
yarn dev
- Voice Selection: Choose from various AI voice profiles
- Temperature: Control response creativity (0.0 - 1.0)
- Top-P & Top-K: Fine-tune response generation
- System Prompt: Define agent behavior and personality
- Response Modalities: Configure audio-only or multi-modal responses
interface MeetingConfig {
meeting_id: string;
token: string;
model: string;
voice: string;
personality: string;
temperature: number;
system_prompt: string;
topP: number;
topK: number;
}
POST /join-agent
Content-Type: application/json
{
"meeting_id": "your-meeting-id",
"token": "your-videosdk-token",
"model": "gemini-2.0-flash-exp",
"voice": "Puck",
"personality": "friendly",
"temperature": 0.7,
"system_prompt": "You are a helpful AI assistant...",
"topP": 0.9,
"topK": 40
}
POST /leave-agent
Content-Type: application/json
{
"meeting_id": "your-meeting-id"
}
- MyVoiceAgent: Main agent class handling conversation flow
- AgentSession: Manages agent lifecycle and state
- RealTimePipeline: Processes real-time audio streams
- GeminiRealtime: Integration with Google's Gemini Live model
- AgentMeeting: Main meeting interface component
- Toast System: User feedback and notifications
- Mobile Responsive: Optimized for all device types
- Environment variables for sensitive API keys
- CORS configuration for secure cross-origin requests
- Token-based authentication for VideoSDK integration
- Input validation and error handling
- Python Version: Ensure you're using Python 3.12 or higher
- API Keys: Verify your Google API key and VideoSDK token are correct
- Network Issues: Use ngrok for local development if encountering connection issues
- Dependencies: Make sure all pip and npm dependencies are installed
Enable debug logging by setting environment variables:
export DEBUG=true
export LOG_LEVEL=debug
- Connection Pooling: Efficient WebRTC connection management
- Background Tasks: Non-blocking agent operations
- Session Management: Optimized memory usage for multiple concurrent sessions
- Error Recovery: Automatic reconnection and graceful degradation
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
For support and questions:
- Check the VideoSDK Documentation
- Create an issue in this repository
- Multi-language support
- Advanced analytics and reporting
- Custom voice training capabilities
- Integration with popular CRM systems
- Enhanced AI personality customization
Ready to revolutionize your communication with AI voice agents? Get started now and build the future of intelligent voice interactions!