A voice-to-text typing assistant designed for VR gaming and hands-free text input. This application uses OpenAI's Whisper API to transcribe speech and automatically types the transcribed text into the active application.
- Hotkey-activated recording: Press Ctrl+Alt+0 (configurable) to start/stop voice recording
- Automatic silence detection: Stops recording after a period of silence (default: 1 second)
- Real-time transcription: Uses OpenAI Whisper API for accurate speech-to-text conversion
- Automatic typing: Transcribed text is automatically pasted into the active application
- Configurable settings: Customize audio device, silence timeout, sample rate, and hotkey via environment variables
- Optional context vocabulary: Use a
context.csvfile to bias transcription toward specific words (e.g. game locations and names), so Whisper prefers "Lorville" instead of "Lawville"
- Python 3.8 or higher
- OpenAI API key
- Microphone access
- Windows, Linux, or macOS
-
Clone this repository:
git clone https://github.com/FixerSchis/sc-voice.git cd sc-voice -
Install dependencies:
pip install -r requirements.txt
-
Create a
.envfile from the example:cp .env.example .env
-
Edit
.envand add your OpenAI API key:OPENAI_API_KEY=your_openai_api_key_here
Edit the .env file to customize settings:
OPENAI_API_KEY: Your OpenAI API key (required)AUDIO_DEVICE_INDEX: Audio input device index (default: 0)SILENCE_TIMEOUT: Seconds of silence before stopping recording (default: 1.0)SAMPLE_RATE: Audio sample rate in Hz (default: 16000)HOTKEY: Hotkey combination to toggle recording (default: ctrl+alt+0)
The default download includes context.csv with Star Citizen locations, companies, points of interest, resources, and in-universe terms. Context is “look out for these words”—Whisper still transcribes normal speech; the file just biases it toward these spellings when it hears them (e.g. "Lorville" instead of "Lawville", "Xi'an" when you say "Zee-an").
- If
context.csvexists (default): Whisper uses it as a vocabulary hint for transcription. - If you delete
context.csv: No context is used; transcription uses Whisper’s default vocabulary. - To use your own vocabulary: Edit
context.csvor replace it with your own terms.
CSV format: One term per row; first column is the word. Optional second column is a pronunciation hint (e.g. Xi'an,Zee-an or Vanduul,Van-dool) so Whisper knows how to match what you say to the correct spelling. An optional header row with "term" / "word" / "name" is ignored.
Whisper uses only the first ~224 tokens of the context, so very long lists are truncated.
- Ensure your
.envfile is configured with your OpenAI API key - Run the application:
python voice_typing.py
- The application will start and wait for the hotkey
- Press Ctrl+Alt+0 (or your configured hotkey) to start recording
- Speak your text
- Recording stops automatically after silence is detected
- The transcribed text will be typed into the active application
Note: On Windows, you may need to run as administrator if you encounter permission issues. On Linux/WSL, you may need to use sudo.
Pre-built executables are automatically created for Windows, Linux, and macOS. Download the latest release from the Releases page and extract the archive for your platform.
- Extract the archive for your operating system
- Create a
.envfile in the same directory as the executable (you can copy.env.exampleas a template) - Add your OpenAI API key to the
.envfile - Run the executable:
- Windows:
voice_typing.exe - Linux:
./voice_typing - macOS:
./voice_typing
- Windows:
The application behavior is identical to running from source.
To build a standalone executable:
-
Install PyInstaller:
pip install pyinstaller
-
Build the executable:
pyinstaller --onefile --name voice_typing voice_typing.py
-
The executable will be in the
dist/directory
For Windows, you can also use:
pyinstaller --onefile --name voice_typing --icon=NONE voice_typing.py- The application monitors keyboard input for the configured hotkey combination
- When the hotkey is pressed, audio recording begins from the configured microphone
- The application continuously monitors audio levels to detect speech and silence
- After a period of silence (configurable), recording stops automatically
- The recorded audio is sent to OpenAI's Whisper API for transcription
- The transcribed text is copied to the clipboard and pasted into the active application using keyboard simulation
Permission errors: On Windows, try running as administrator. On Linux, you may need to run with sudo.
Audio device not found: Check your audio device index using Python:
import sounddevice as sd
print(sd.query_devices())Update AUDIO_DEVICE_INDEX in your .env file accordingly.
Hotkey not working: Ensure no other application is using the same hotkey combination. You can change the hotkey in your .env file.
Text not typing: Make sure the target application has keyboard focus. The application uses clipboard paste (Ctrl+V) which should work in most applications.
[Add your license here]
[Add contribution guidelines if desired]