A Python application that provides system-wide speech recognition with global hotkey control. The transcribed text is automatically typed at the current cursor position, making it work in any application. Uses the server of this project: https://github.com/QuentinFuxa/WhisperLiveKit
- Global hotkey: Toggle recording with
Ctrl+Alt+R(customizable) - Universal text insertion: Works in any application where you can type
- Real-time transcription: Uses WhisperLiveKit for accurate speech recognition
- Modular design: Clean, maintainable code structure
- Configurable: Easy to customize hotkeys and settings
WhisperLiveKit
- Python 3.7+
- WhisperLiveKit server
- Microphone access
- X11 session (Linux)
-
Clone the repository:
git clone <repository-url> cd whispaholics
-
Install system dependencies (Ubuntu/Debian):
sudo apt-get install portaudio19-dev python3-dev
-
Install Python dependencies:
pip install -r requirements.txt
-
Set up configuration:
# Copy the example configuration file cp config.example.py config.py # Edit config.py to match your setup nano config.py # or use your preferred editor
Important: The
config.pyfile is git-ignored for security and personalization. You must create it from the example file and configure it for your environment.
-
Start WhisperLiveKit server (configure the server URL in
config.py):# Example - adjust based on your WhisperLiveKit setup python -m whisperlivekit.basic_server -
Run the application:
python main.py
-
Use the hotkey:
- Position your cursor where you want text to appear
- Press
Ctrl+Alt+Rto start recording - Speak clearly into your microphone
- Press
Ctrl+Alt+Ragain to stop recording - Transcribed text will be automatically typed
Setup: Copy config.example.py to config.py and customize your settings:
cp config.example.py config.pyThen edit config.py to customize settings:
websocket_url: str = "ws://your-server:port/asr"# Default: Ctrl+Alt+R
hotkey: frozenset = frozenset({Key.ctrl_l, Key.alt_l, keyboard.KeyCode.from_char('r')})
# Alternative examples:
# F12 key only
hotkey: frozenset = frozenset({Key.f12})
# Ctrl+Shift+S
hotkey: frozenset = frozenset({Key.ctrl_l, Key.shift, keyboard.KeyCode.from_char('s')})rate: int = 16000 # Sample rate
channels: int = 1 # Mono audio
chunk_duration_ms: float = 256.0 # Audio chunk duration in milliseconds
# chunk_size is calculated automatically: int(rate * chunk_duration_ms / 1000)hotkey_cooldown: float = 0.5 # Seconds between hotkey activations
max_wait_time: float = 10.0 # Seconds to wait for server processing
typing_delay: float = 0.015 # Seconds between characters when typingwhispaholics/
├── main.py # Entry point
├── config.example.py # Example configuration (copy to config.py)
├── config.py # Your configuration settings (git-ignored)
├── speech_recognition_app.py # Main application logic
├── audio_recorder.py # Audio recording functionality
├── websocket_client.py # WebSocket communication
├── text_inserter.py # Text insertion logic
├── setup.sh # Installation script
├── requirements.txt # Python dependencies
└── README.md # This file
Note: config.py is git-ignored to keep your personal settings private. Always use config.example.py as your starting point.
- "No module named 'config'": You need to create
config.pyfrom the example:cp config.example.py config.py
- Connection issues: Check the
websocket_urlin yourconfig.py - Import errors: Make sure you've installed all dependencies with
pip install -r requirements.txt
- Ensure WhisperLiveKit server is running and accessible
- Check the
websocket_urlinconfig.py - Verify no firewall is blocking the connection
- Grant microphone permissions to your terminal/Python
- Check if another application is using the microphone
- Verify your microphone is working with other applications
Ubuntu/Debian:
sudo apt-get install portaudio19-dev python3-dev
pip install pyaudiomacOS:
brew install portaudio
pip install pyaudioAudio Device Permissions (Linux):
sudo usermod -a -G audio $USER
# Log out and log back in- Ensure you're running on an X11 session (not Wayland)
- Check if other applications are capturing the same hotkey
- Try running with elevated permissions if necessary
- Linux: Tested on Ubuntu 20.04+ with X11
- macOS: Should work with Homebrew-installed dependencies
- Windows: Not currently supported
- Hotkey Detection: Uses
pynputto detect global hotkey presses - Audio Recording: Uses
pyaudioto capture microphone input in real-time - WebSocket Communication: Sends audio data to WhisperLiveKit server via WebSocket
- Speech Recognition: WhisperLiveKit processes audio and returns transcription
- Text Insertion: Uses
pynputto simulate keyboard typing at cursor position
- Audio is only recorded when you actively press the hotkey
- Audio data is sent to your configured WhisperLiveKit server
- No data is sent to external services by default
- You have full control over where your audio is processed
This project is provided as-is for use with WhisperLiveKit.