🤖 AI Voice Assistant with Push-to-Talk

� Advanced Voice Conversation System

A high-performance, real-time voice AI assistant featuring push-to-talk functionality, optimized response times, and intelligent conversation flow. Built with Google Gemini AI for natural, human-like interactions.

🌟 Key Features

🎤 Push-to-Talk System

Hold & Release: Hold button/spacebar to record, release to send automatically
Smart Auto-Response: AI responds immediately when you stop talking or release button
Interrupt Capability: Press button during AI speech to stop and start new conversation
Voice Activity Detection: Automatic silence detection for seamless conversations

⚡ Performance Optimized

3x Faster Response: Reduced response time from 10-15s to 3-6s
Smart Timeout Handling: 8-second API timeout with graceful fallback
Optimized Audio Processing: Enhanced voice detection and processing
Minimal Latency: Real-time WebSocket communication

🎨 Intelligent UI/UX

Smart Animations: Wave visualizer only animates when AI is speaking
Clean Interface: Minimal, distraction-free design
Visual Feedback: Clear status indicators for each interaction state
Responsive Design: Works perfectly on desktop and mobile

🚀 Quick Start

1. Clone & Setup

git clone https://github.com/mynamevansh/Gemini-API.git
cd Gemini-API
npm install

2. Configure API Key

Create a .env file with your Gemini API key:

GEMINI_API_KEY=your_gemini_api_key_here

3. Start the Server

npm start

4. Open & Use

Navigate to: http://localhost:3000
Allow microphone permission
Hold "Hold to Talk" button → Speak → Release → AI responds automatically!

� How to Use (Push-to-Talk)

🖱️ Mouse/Touch

Press and Hold the "Hold to Talk" button
Speak your message clearly
Release the button when finished
AI responds automatically - no extra button presses needed!

⌨️ Keyboard (Hands-free)

Press and Hold Spacebar
Speak your message
Release Spacebar
AI responds immediately

🛑 During AI Response

Press button once to stop AI and start new conversation
ESC key to stop AI response

💬 Conversation Examples

Perfect for natural conversations:

Questions: "What's the weather like today?"
Explanations: "How does quantum computing work?"
Creative Tasks: "Write me a short poem about coffee"
Problem Solving: "Help me plan a weekend trip"
Casual Chat: "Tell me something interesting"
Technical Help: "Explain JavaScript promises"

🔧 Technical Architecture

Frontend Stack

HTML5 + CSS3: Modern responsive interface
JavaScript ES6+: Advanced audio processing and WebSocket handling
Web Speech API: Real-time speech recognition
WebRTC: High-quality audio capture and playback

Backend Stack

Node.js + Express: High-performance server
WebSocket (ws): Real-time bidirectional communication
Google Gemini AI: Advanced language model integration
Environment Variables: Secure API key management

Performance Features

Optimized API Calls: Reduced token usage for faster responses
Smart Timeout Management: 8s timeout with automatic retry
Efficient Audio Processing: Optimized voice activity detection
Memory Management: Clean state management and garbage collection

📊 Status Indicators & Visual Feedback

State	Visual	Description
🟢 Ready	Static waves, button ready	Ready for input
🔵 Listening	Soft green glow, "Recording..."	Voice recording active
🟡 Processing	No animation, "Processing..."	AI thinking
🟠 AI Speaking	Animated orange waves	AI responding
🔴 Error	Error message display	Connection/API issues

⚙️ Configuration & Requirements

Prerequisites

Node.js (v14 or higher)
Modern Browser with Web Speech API support
Microphone access and permissions
Google Gemini API Key (Get one here)

Environment Setup

# Install dependencies
npm install

# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env

Supported Browsers

✅ Chrome (Recommended) - Full feature support
✅ Edge - Full feature support
✅ Firefox - Good support
✅ Safari - Basic support

🐛 Troubleshooting Guide

🎤 Microphone Issues

Problem: "Microphone access denied"
Solutions:
• Click 🔒 in address bar → Allow microphone
• Check Windows Privacy Settings → Microphone access
• Ensure no other apps are using microphone
• Try refreshing the page (Ctrl+F5)

🔗 Connection Problems

Problem: "Connection error" or timeouts
Solutions:
• Verify server is running: npm start
• Check API key in .env file
• Test internet connection
• Try: taskkill /F /IM node.exe && npm start

⚡ Performance Issues

Problem: Slow responses or no responses
Solutions:
• Check console for errors (F12)
• Verify API key has sufficient quota
• Try "Test AI" button first
• Close other browser tabs to free resources

🔊 Audio Problems

Problem: No AI voice output
Solutions:
• Check system volume and browser audio
• Verify speakers/headphones are working
• Try different browser or incognito mode
• Check browser's autoplay policy settings

🎨 Customization Options

Voice & Audio Settings

// In script.js - Customize speech synthesis
currentUtterance.rate = 1.2;     // Speech speed (0.5-2.0)
currentUtterance.pitch = 1;      // Voice pitch (0.5-2.0)
currentUtterance.volume = 1;     // Volume (0.0-1.0)

Voice Detection Sensitivity

// In script.js - Adjust detection thresholds
const SILENCE_THRESHOLD = 1200;  // Silence detection (ms)
const VOLUME_THRESHOLD = 0.003;  // Voice sensitivity
const MIN_SPEECH_DURATION = 200; // Minimum speech length (ms)

UI Theme Customization

/* In style.css - Change color scheme */
--primary-gradient: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
--success-gradient: linear-gradient(135deg, #28a745, #20c997);
--speaking-gradient: linear-gradient(135deg, #ff6b6b, #feca57);

AI Response Customization

// In server/index.js - Modify AI behavior
generationConfig: {
  temperature: 0.3,      // Creativity (0.0-1.0)
  maxOutputTokens: 500,  // Response length
  candidateCount: 1      // Number of responses
}

🚀 Advanced Features & Tips

🔥 Power User Features

Multiple Conversations: Each button press starts fresh context
Interrupt & Resume: Stop AI anytime to ask follow-up questions
Smart Timeout Handling: Automatic retry on network issues
Voice Activity Detection: Hands-free operation with spacebar
Mobile Optimized: Touch-friendly interface for smartphones

� Performance Tips

Use Chrome for best performance and compatibility
Close unused tabs to free up system resources
Stable internet ensures faster AI response times
Clear browser cache if experiencing issues
Update browser for latest Web Speech API features

🔒 Privacy & Security

Local Processing: Voice processing happens in your browser
Secure Connection: HTTPS/WSS encryption for data transmission
No Storage: Conversations are not stored or logged
API Key Security: Keep your Gemini API key private

🌟 What Makes This Special?

✨ Instant Interaction: No complex button sequences - just hold, talk, and release
🚀 Lightning Fast: Optimized for 3-6 second total response times
🎯 Zero Friction: Push-to-talk eliminates UI complexity
🔄 Smart Flow: AI automatically responds when you finish speaking
🎨 Polished UX: Clean animations that enhance rather than distract
🛡️ Bulletproof: Comprehensive error handling and recovery systems

📝 Recent Updates (v2.0)

🎤 Push-to-Talk System: Complete interaction model overhaul
⚡ 3x Performance Boost: Dramatically improved response times
🎨 Smart Animations: Context-aware visual feedback
🔧 Enhanced Error Handling: Robust timeout and retry mechanisms
📱 Mobile Optimization: Touch-friendly interface improvements
🧹 Clean Console Output: Minimal, developer-friendly logging

Enjoy seamless voice conversations with AI! 🎉

Built with ❤️ using Node.js, Google Gemini AI, and modern web technologies

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.vscode		.vscode
node_modules		node_modules
public		public
server		server
.env		.env
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json

Folders and files

Latest commit

History

Repository files navigation

🤖 AI Voice Assistant with Push-to-Talk

� Advanced Voice Conversation System

🌟 Key Features

🎤 Push-to-Talk System

⚡ Performance Optimized

🎨 Intelligent UI/UX

🚀 Quick Start

1. Clone & Setup

2. Configure API Key

3. Start the Server

4. Open & Use

� How to Use (Push-to-Talk)

🖱️ Mouse/Touch

⌨️ Keyboard (Hands-free)

🛑 During AI Response

💬 Conversation Examples

🔧 Technical Architecture

Frontend Stack

Backend Stack

Performance Features

📊 Status Indicators & Visual Feedback

⚙️ Configuration & Requirements

Prerequisites

Environment Setup

Supported Browsers

🐛 Troubleshooting Guide

🎤 Microphone Issues

🔗 Connection Problems

⚡ Performance Issues

🔊 Audio Problems

🎨 Customization Options

Voice & Audio Settings

Voice Detection Sensitivity

UI Theme Customization

AI Response Customization

🚀 Advanced Features & Tips

🔥 Power User Features

� Performance Tips

🔒 Privacy & Security

🌟 What Makes This Special?

📝 Recent Updates (v2.0)

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages