A high-performance, real-time voice AI assistant featuring push-to-talk functionality, optimized response times, and intelligent conversation flow. Built with Google Gemini AI for natural, human-like interactions.
- Hold & Release: Hold button/spacebar to record, release to send automatically
- Smart Auto-Response: AI responds immediately when you stop talking or release button
- Interrupt Capability: Press button during AI speech to stop and start new conversation
- Voice Activity Detection: Automatic silence detection for seamless conversations
- 3x Faster Response: Reduced response time from 10-15s to 3-6s
- Smart Timeout Handling: 8-second API timeout with graceful fallback
- Optimized Audio Processing: Enhanced voice detection and processing
- Minimal Latency: Real-time WebSocket communication
- Smart Animations: Wave visualizer only animates when AI is speaking
- Clean Interface: Minimal, distraction-free design
- Visual Feedback: Clear status indicators for each interaction state
- Responsive Design: Works perfectly on desktop and mobile
git clone https://github.com/mynamevansh/Gemini-API.git
cd Gemini-API
npm installCreate a .env file with your Gemini API key:
GEMINI_API_KEY=your_gemini_api_key_herenpm start- Navigate to:
http://localhost:3000 - Allow microphone permission
- Hold "Hold to Talk" button → Speak → Release → AI responds automatically!
- Press and Hold the "Hold to Talk" button
- Speak your message clearly
- Release the button when finished
- AI responds automatically - no extra button presses needed!
- Press and Hold Spacebar
- Speak your message
- Release Spacebar
- AI responds immediately
- Press button once to stop AI and start new conversation
- ESC key to stop AI response
Perfect for natural conversations:
- Questions: "What's the weather like today?"
- Explanations: "How does quantum computing work?"
- Creative Tasks: "Write me a short poem about coffee"
- Problem Solving: "Help me plan a weekend trip"
- Casual Chat: "Tell me something interesting"
- Technical Help: "Explain JavaScript promises"
- HTML5 + CSS3: Modern responsive interface
- JavaScript ES6+: Advanced audio processing and WebSocket handling
- Web Speech API: Real-time speech recognition
- WebRTC: High-quality audio capture and playback
- Node.js + Express: High-performance server
- WebSocket (ws): Real-time bidirectional communication
- Google Gemini AI: Advanced language model integration
- Environment Variables: Secure API key management
- Optimized API Calls: Reduced token usage for faster responses
- Smart Timeout Management: 8s timeout with automatic retry
- Efficient Audio Processing: Optimized voice activity detection
- Memory Management: Clean state management and garbage collection
| State | Visual | Description |
|---|---|---|
| 🟢 Ready | Static waves, button ready | Ready for input |
| 🔵 Listening | Soft green glow, "Recording..." | Voice recording active |
| 🟡 Processing | No animation, "Processing..." | AI thinking |
| 🟠 AI Speaking | Animated orange waves | AI responding |
| 🔴 Error | Error message display | Connection/API issues |
- Node.js (v14 or higher)
- Modern Browser with Web Speech API support
- Microphone access and permissions
- Google Gemini API Key (Get one here)
# Install dependencies
npm install
# Create .env file
echo "GEMINI_API_KEY=your_api_key_here" > .env✅ Chrome (Recommended) - Full feature support
✅ Edge - Full feature support
✅ Firefox - Good support
✅ Safari - Basic support
Problem: "Microphone access denied"
Solutions:
• Click 🔒 in address bar → Allow microphone
• Check Windows Privacy Settings → Microphone access
• Ensure no other apps are using microphone
• Try refreshing the page (Ctrl+F5)
Problem: "Connection error" or timeouts
Solutions:
• Verify server is running: npm start
• Check API key in .env file
• Test internet connection
• Try: taskkill /F /IM node.exe && npm start
Problem: Slow responses or no responses
Solutions:
• Check console for errors (F12)
• Verify API key has sufficient quota
• Try "Test AI" button first
• Close other browser tabs to free resources
Problem: No AI voice output
Solutions:
• Check system volume and browser audio
• Verify speakers/headphones are working
• Try different browser or incognito mode
• Check browser's autoplay policy settings
// In script.js - Customize speech synthesis
currentUtterance.rate = 1.2; // Speech speed (0.5-2.0)
currentUtterance.pitch = 1; // Voice pitch (0.5-2.0)
currentUtterance.volume = 1; // Volume (0.0-1.0)// In script.js - Adjust detection thresholds
const SILENCE_THRESHOLD = 1200; // Silence detection (ms)
const VOLUME_THRESHOLD = 0.003; // Voice sensitivity
const MIN_SPEECH_DURATION = 200; // Minimum speech length (ms)/* In style.css - Change color scheme */
--primary-gradient: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
--success-gradient: linear-gradient(135deg, #28a745, #20c997);
--speaking-gradient: linear-gradient(135deg, #ff6b6b, #feca57);// In server/index.js - Modify AI behavior
generationConfig: {
temperature: 0.3, // Creativity (0.0-1.0)
maxOutputTokens: 500, // Response length
candidateCount: 1 // Number of responses
}- Multiple Conversations: Each button press starts fresh context
- Interrupt & Resume: Stop AI anytime to ask follow-up questions
- Smart Timeout Handling: Automatic retry on network issues
- Voice Activity Detection: Hands-free operation with spacebar
- Mobile Optimized: Touch-friendly interface for smartphones
- Use Chrome for best performance and compatibility
- Close unused tabs to free up system resources
- Stable internet ensures faster AI response times
- Clear browser cache if experiencing issues
- Update browser for latest Web Speech API features
- Local Processing: Voice processing happens in your browser
- Secure Connection: HTTPS/WSS encryption for data transmission
- No Storage: Conversations are not stored or logged
- API Key Security: Keep your Gemini API key private
✨ Instant Interaction: No complex button sequences - just hold, talk, and release
🚀 Lightning Fast: Optimized for 3-6 second total response times
🎯 Zero Friction: Push-to-talk eliminates UI complexity
🔄 Smart Flow: AI automatically responds when you finish speaking
🎨 Polished UX: Clean animations that enhance rather than distract
🛡️ Bulletproof: Comprehensive error handling and recovery systems
- 🎤 Push-to-Talk System: Complete interaction model overhaul
- ⚡ 3x Performance Boost: Dramatically improved response times
- 🎨 Smart Animations: Context-aware visual feedback
- 🔧 Enhanced Error Handling: Robust timeout and retry mechanisms
- 📱 Mobile Optimization: Touch-friendly interface improvements
- 🧹 Clean Console Output: Minimal, developer-friendly logging
Enjoy seamless voice conversations with AI! 🎉
Built with ❤️ using Node.js, Google Gemini AI, and modern web technologies