Voice Typing Assistant

A voice-to-text typing assistant designed for VR gaming and hands-free text input. This application uses OpenAI's Whisper API to transcribe speech and automatically types the transcribed text into the active application.

Features

Hotkey-activated recording: Press Ctrl+Alt+0 (configurable) to start/stop voice recording
Automatic silence detection: Stops recording after a period of silence (default: 1 second)
Real-time transcription: Uses OpenAI Whisper API for accurate speech-to-text conversion
Automatic typing: Transcribed text is automatically pasted into the active application
Configurable settings: Customize audio device, silence timeout, sample rate, and hotkey via environment variables
Optional context vocabulary: Use a context.csv file to bias transcription toward specific words (e.g. game locations and names), so Whisper prefers "Lorville" instead of "Lawville"

Requirements

Python 3.8 or higher
OpenAI API key
Microphone access
Windows, Linux, or macOS

Installation

Clone this repository:

git clone https://github.com/FixerSchis/sc-voice.git
cd sc-voice

Install dependencies:
```
pip install -r requirements.txt
```
Create a .env file from the example:
```
cp .env.example .env
```

Edit .env and add your OpenAI API key:

OPENAI_API_KEY=your_openai_api_key_here

Configuration

Edit the .env file to customize settings:

OPENAI_API_KEY: Your OpenAI API key (required)
AUDIO_DEVICE_INDEX: Audio input device index (default: 0)
SILENCE_TIMEOUT: Seconds of silence before stopping recording (default: 1.0)
SAMPLE_RATE: Audio sample rate in Hz (default: 16000)
HOTKEY: Hotkey combination to toggle recording (default: ctrl+alt+0)

Context vocabulary (optional)

The default download includes context.csv with Star Citizen locations, companies, points of interest, resources, and in-universe terms. Context is “look out for these words”—Whisper still transcribes normal speech; the file just biases it toward these spellings when it hears them (e.g. "Lorville" instead of "Lawville", "Xi'an" when you say "Zee-an").

If context.csv exists (default): Whisper uses it as a vocabulary hint for transcription.
If you delete context.csv: No context is used; transcription uses Whisper’s default vocabulary.
To use your own vocabulary: Edit context.csv or replace it with your own terms.

CSV format: One term per row; first column is the word. Optional second column is a pronunciation hint (e.g. Xi'an,Zee-an or Vanduul,Van-dool) so Whisper knows how to match what you say to the correct spelling. An optional header row with "term" / "word" / "name" is ignored.

Whisper uses only the first ~224 tokens of the context, so very long lists are truncated.

Usage

Running from Source

Ensure your .env file is configured with your OpenAI API key
Run the application:
```
python voice_typing.py
```
The application will start and wait for the hotkey
Press Ctrl+Alt+0 (or your configured hotkey) to start recording
Speak your text
Recording stops automatically after silence is detected
The transcribed text will be typed into the active application

Note: On Windows, you may need to run as administrator if you encounter permission issues. On Linux/WSL, you may need to use sudo.

Running Compiled Release

Pre-built executables are automatically created for Windows, Linux, and macOS. Download the latest release from the Releases page and extract the archive for your platform.

Extract the archive for your operating system
Create a .env file in the same directory as the executable (you can copy .env.example as a template)
Add your OpenAI API key to the .env file
Run the executable:
- Windows: voice_typing.exe
- Linux: ./voice_typing
- macOS: ./voice_typing

The application behavior is identical to running from source.

Building from Source

To build a standalone executable:

Install PyInstaller:
```
pip install pyinstaller
```

Build the executable:

pyinstaller --onefile --name voice_typing voice_typing.py

The executable will be in the dist/ directory

For Windows, you can also use:

pyinstaller --onefile --name voice_typing --icon=NONE voice_typing.py

How It Works

The application monitors keyboard input for the configured hotkey combination
When the hotkey is pressed, audio recording begins from the configured microphone
The application continuously monitors audio levels to detect speech and silence
After a period of silence (configurable), recording stops automatically
The recorded audio is sent to OpenAI's Whisper API for transcription
The transcribed text is copied to the clipboard and pasted into the active application using keyboard simulation

Troubleshooting

Permission errors: On Windows, try running as administrator. On Linux, you may need to run with sudo.

Audio device not found: Check your audio device index using Python:

import sounddevice as sd
print(sd.query_devices())

Update AUDIO_DEVICE_INDEX in your .env file accordingly.

Hotkey not working: Ensure no other application is using the same hotkey combination. You can change the hotkey in your .env file.

Text not typing: Make sure the target application has keyboard focus. The application uses clipboard paste (Ctrl+V) which should work in most applications.

License

[Add your license here]

Contributing

[Add contribution guidelines if desired]

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
.vscode		.vscode
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
context.csv		context.csv
requirements.txt		requirements.txt
voice_typing.py		voice_typing.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Voice Typing Assistant

Features

Requirements

Installation

Configuration

Context vocabulary (optional)

Usage

Running from Source

Running Compiled Release

Building from Source

How It Works

Troubleshooting

License

Contributing

About

Uh oh!

Releases 6

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Voice Typing Assistant

Features

Requirements

Installation

Configuration

Context vocabulary (optional)

Usage

Running from Source

Running Compiled Release

Building from Source

How It Works

Troubleshooting

License

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 6

Contributors

Uh oh!

Languages