Voice Chat

A real-time voice interaction system that converts speech to text, generates AI responses, and provides text-to-speech output using OpenAI's APIs.

Features

🎤 Real-time voice recording
🔄 Speech-to-text conversion
🤖 AI-powered responses
🔊 Text-to-speech playback
📊 Status monitoring and logging
🌐 Web-based interface

Prerequisites

Python 3.10 or higher
Virtual environment (recommended)
OpenAI API key
Google API key (for speech recognition fallback)

Installation

Clone the repository:

bash
git clone https://github.com/alakob/ai_voice_chat.git
cd ai_voice_chat

Create and activate a virtual environment:

bash
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Create a .env file in the project root:

OPENAI_API_KEY=your_openai_api_key
GOOGLE_API_KEY=your_google_api_key
ANTHROPIC_API_KEY=your_anthropic_api_key
HF_TOKEN=your_huggingface_token
GEMINI_APIKEY=your_gemini_api_key
DEEPSEEK_API_KEY=your_deepseek_api_key

Project Structure

src/
├── voice_assistant/
│ ├── init.py
│ ├── config.py # Configuration and environment settings
│ ├── models.py # Data models and schemas
│ ├── exceptions.py # Custom exception definitions
│ ├── state.py # Global state management
│ ├── services/
│ │ ├── init.py
│ │ ├── audio_service.py # Audio processing functionality
│ │ └── openai_service.py # OpenAI API integration
│ └── ui/
│ ├── init.py
│ └── gradio_interface.py # Web interface components
├── main.py # Application entry point
├── requirements.txt # Project dependencies
└── .env # Environment variables

Usage

Start the application:

python src/main.py

Open your web browser and navigate to the provided URL (typically http://localhost:7860)
Use the interface:
- Click "Start Recording" to begin voice capture
- Speak clearly into your microphone
- Click "Stop Recording" when finished
- Wait for the AI response
- Listen to the spoken response
- Use "Stop Audio" to interrupt playback

Development

Running Tests

pytest tests/

Code Style

The project follows PEP 8 guidelines. Format code using:

black src/

Type Checking

mypy src/

API Documentation

Audio Service

start_recording(): Initiates audio capture
play_audio(): Handles audio playback
process_audio(): Processes recorded audio

OpenAI Service

generate_ai_response(): Creates AI responses
text_to_speech(): Converts text to speech

Configuration

Key settings in config.py:

Audio sample rate: 16000 Hz
Audio channels: 1 (mono)
Audio format: float32
Model settings: GPT-4 for responses, TTS-1 for speech

Error Handling

The application includes custom exceptions:

AudioProcessingError
TranscriptionError
TTSError

Contributing

Fork the repository
Create a feature branch
Commit your changes
Push to the branch
Create a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

OpenAI for GPT and TTS APIs
Gradio for the web interface
SoundDevice for audio processing

Contact

Your Name - blaisealako@gmail.com Project Link: https://github.com/alakob/ai_voice_chat

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
project_structure.txt		project_structure.txt
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Chat

Features

Prerequisites

Installation

Project Structure

Usage

Development

Running Tests

Code Style

Type Checking

API Documentation

Audio Service

OpenAI Service

Configuration

Error Handling

Contributing

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Chat

Features

Prerequisites

Installation

Project Structure

Usage

Development

Running Tests

Code Style

Type Checking

API Documentation

Audio Service

OpenAI Service

Configuration

Error Handling

Contributing

License

Acknowledgments

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages