Skip to content

Feature/vad integration#122

Open
Aaryan2304 wants to merge 4 commits intosohzm:masterfrom
Aaryan2304:feature/vad-integration
Open

Feature/vad integration#122
Aaryan2304 wants to merge 4 commits intosohzm:masterfrom
Aaryan2304:feature/vad-integration

Conversation

@Aaryan2304
Copy link

🎤 Voice Activity Detection (VAD) Integration

This PR introduces intelligent Voice Activity Detection to enhance the real-time conversation experience by automatically detecting when the user starts and stops speaking.

✨ Features Added

  • Smart Audio Segmentation: Automatically detects speech boundaries using @ricky0123/vad-node
  • Configurable VAD Toggle: Users can enable/disable VAD through the Customize settings
  • Seamless Integration: Works with existing Windows/Linux audio capture systems
  • State Management: Robust finite state machine (IDLE → LISTENING → RECORDING → COMMITTING)
  • Persistent Settings: VAD preferences saved in localStorage

🔧 Technical Implementation

  • Core Module: src/utils/vad.js - VAD processor with audio format conversion
  • UI Integration: Enhanced CustomizeView.js with VAD toggle checkbox
  • Audio Pipeline: Enhanced renderer.js with conditional VAD processing
  • IPC Communication: Added VAD-specific handlers in main process
  • Comprehensive Testing: Full test suite with 100% passing tests

🚀 Benefits

  • Improved UX: No more manual start/stop - conversations flow naturally
  • Better Audio Quality: Intelligent segmentation reduces noise and dead air
  • Resource Efficient: VAD only processes when speech is detected
  • Backward Compatible: Existing functionality unchanged when VAD is disabled

🧪 Testing

  • ✅ All existing tests continue to pass (10/10)
  • ✅ New VAD-specific tests added and passing
  • ✅ Manual testing on Windows audio capture
  • ✅ UI toggle functionality verified
  • ✅ Settings persistence confirmed

📦 Dependencies

  • Added: @ricky0123/vad-node@0.0.3 for voice activity detection
  • No breaking changes to existing dependencies

🔄 Usage

  1. Enable VAD in Customize → "Enable Voice Activity Detection"
  2. Start conversation - VAD automatically detects speech
  3. Audio is intelligently segmented and sent to AI assistant
  4. Seamless conversation flow without manual intervention

Ready for review and testing! 🚀

- Added @ricky0123/vad-node dependency to original project
- Integrated VAD toggle in CustomizeView with persistent settings
- Enhanced renderer.js with VAD audio processing for both Linux and Windows
- Added VAD IPC handlers in main process (send-vad-audio-segment, update-vad-setting)
- VAD processor conditionally initialized based on user settings
- Maintains backward compatibility - works with or without VAD enabled
- All tests passing (10/10) including new VAD test suite
- Reduced CustomizeView padding from 12px to 8px
- Decreased settings container gaps from 12px to 8px
- Optimized form section margins and padding
- Reduced checkbox group margin-bottom from 10px to 6px
- Decreased window height for customize view (720px normal, 620px compact)
- Maintained all functionality while making UI more compact
- All tests passing (10/10)
- Main window: 650x450  800x450 (wider for better usability)
- Compact layout: 500x350  650x350 (proportionally wider)
- Customize view max-width: 600px  750px (matches new width)
- Final dimensions provide perfect balance:
  * Height: Optimal for screen real estate (much shorter than original)
  * Width: Comfortable for content readability and interaction
- Maintains responsive design and all functionality
- All tests passing (10/10)
- Removed CLEANUP_SUMMARY.md, CONTRIBUTING.md, FEATURES.md
- Removed FRESH_FORK_SETUP.md, GITHUB_CONTRIBUTION_GUIDE.md
- Removed temporary UI comparison images (New UI.jpg, Old UI.jpg, Width Issue.jpg)
- Local repository now matches intended final state
- Ready for clean pull request
@Kanishk1420
Copy link

@Aaryan2304 @sohzm bros can you work or fix the bug on the feature where we can copy the text response which got generated well. for some reason its not working well and also can you add the keybaord shortcut functionality of this well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants