A real-time voice interaction system that converts speech to text, generates AI responses, and provides text-to-speech output using OpenAI's APIs.
- 🎤 Real-time voice recording
- 🔄 Speech-to-text conversion
- 🤖 AI-powered responses
- 🔊 Text-to-speech playback
- 📊 Status monitoring and logging
- 🌐 Web-based interface
- Python 3.10 or higher
- Virtual environment (recommended)
- OpenAI API key
- Google API key (for speech recognition fallback)
- Clone the repository:
bash
git clone https://github.com/alakob/ai_voice_chat.git
cd ai_voice_chat
- Create and activate a virtual environment:
bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt- Create a
.envfile in the project root:
OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
HF_TOKEN=your_huggingface_token
GEMINI_APIKEY=your_gemini_api_key
DEEPSEEK_API_KEY=your_deepseek_api_keysrc/
├── voice_assistant/
│ ├── init.py
│ ├── config.py # Configuration and environment settings
│ ├── models.py # Data models and schemas
│ ├── exceptions.py # Custom exception definitions
│ ├── state.py # Global state management
│ ├── services/
│ │ ├── init.py
│ │ ├── audio_service.py # Audio processing functionality
│ │ └── openai_service.py # OpenAI API integration
│ └── ui/
│ ├── init.py
│ └── gradio_interface.py # Web interface components
├── main.py # Application entry point
├── requirements.txt # Project dependencies
└── .env # Environment variables
- Start the application:
python src/main.py-
Open your web browser and navigate to the provided URL (typically
http://localhost:7860) -
Use the interface:
- Click "Start Recording" to begin voice capture
- Speak clearly into your microphone
- Click "Stop Recording" when finished
- Wait for the AI response
- Listen to the spoken response
- Use "Stop Audio" to interrupt playback
pytest tests/The project follows PEP 8 guidelines. Format code using:
black src/mypy src/start_recording(): Initiates audio captureplay_audio(): Handles audio playbackprocess_audio(): Processes recorded audio
generate_ai_response(): Creates AI responsestext_to_speech(): Converts text to speech
Key settings in config.py:
- Audio sample rate: 16000 Hz
- Audio channels: 1 (mono)
- Audio format: float32
- Model settings: GPT-4 for responses, TTS-1 for speech
The application includes custom exceptions:
AudioProcessingErrorTranscriptionErrorTTSError
- Fork the repository
- Create a feature branch
- Commit your changes
- Push to the branch
- Create a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for GPT and TTS APIs
- Gradio for the web interface
- SoundDevice for audio processing
Your Name - blaisealako@gmail.com Project Link: https://github.com/alakob/ai_voice_chat