Real-time speech-to-text transcription powered by OpenAI's Whisper AI
Features β’ Installation β’ Usage β’ Configuration β’ Documentation
A powerful, free, and fully local audio transcription application that leverages OpenAI's state-of-the-art Whisper AI model. This tool provides real-time speech-to-text conversion with 95%+ accuracy across 99 languages, running entirely on your machine without requiring API keys or internet connectivity.
- π 100% Free - No API costs, no subscriptions, completely free forever
- π Privacy First - Runs entirely locally, your audio never leaves your computer
- π― High Accuracy - 95%+ transcription accuracy using Whisper AI
- π Multilingual - Supports 99 languages with automatic detection
- β‘ GPU Accelerated - Optimized for NVIDIA GPUs (CPU fallback available)
- π¨ User-Friendly - Clean, intuitive GUI with real-time feedback
- Real-time Transcription - Continuous audio capture and transcription
- Multiple Model Sizes - Choose from tiny to large models based on your needs
- Language Detection - Automatic language identification or manual selection
- Device Selection - Choose between CPU and GPU processing
- Adjustable Chunk Duration - Balance between response time and accuracy
- Microphone Selection - Support for multiple audio input devices
- GPU Acceleration - CUDA support for NVIDIA GPUs (10-20x faster)
- Efficient Processing - Optimized audio handling with minimal latency
- Error Recovery - Robust error handling and automatic recovery
- Clean UI - Modern Tkinter interface with status indicators
- Timestamped Output - Each transcription includes timestamp
- Export Ready - Easy copy/paste of transcribed text
- Operating System: Windows 10/11
- Python: 3.10 or higher
- GPU (Optional): NVIDIA GPU with CUDA support for faster processing
- Microphone: Any working audio input device
git clone https://github.com/fizzexual/audiotranscriber.git
cd audio-transcriberpip install -r requirements.txtRequired packages:
openai-whisper- Whisper AI modeltorch- PyTorch frameworkpyaudio- Audio capturesoundfile- Audio file handlingscipy- Signal processingnumpy- Numerical operations
For NVIDIA GPU users (10-20x faster):
# Run the automated installer
.\Helpers\install_pytorch_gpu_auto.batOr manually:
pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121For CPU-only (slower but works on any system):
pip install torch torchvision torchaudiopython test_microphone.pyThis will test your microphone and verify all dependencies are correctly installed.
-
Launch the application:
python audio_transcriber_whisper_local.py
-
Select your model size:
tiny- 40MB, fastest, good for quick notesbase- 140MB, balanced speed and accuracy (recommended)small- 460MB, better accuracymedium- 1.5GB, high accuracylarge- 3GB, best accuracy
-
Choose your device:
auto- Automatically uses GPU if availablecpu- Force CPU processingcuda- Force GPU processing
-
Click "Load Model" - First time will download the model
-
Select your microphone from the dropdown
-
Click "Start Listening" and begin speaking
-
Watch real-time transcription appear in the text area
| Model | Size | Speed | Accuracy | Best For |
|---|---|---|---|---|
| tiny | 40MB | Very Fast | Good | Quick notes, testing |
| base | 140MB | Fast | Good | General use (recommended) |
| small | 460MB | Medium | Better | Professional transcription |
| medium | 1.5GB | Slow | High | High-accuracy needs |
| large | 3GB | Very Slow | Best | Maximum accuracy required |
With NVIDIA RTX 4060 GPU:
- tiny: ~0.5s per 5s audio chunk
- base: ~1s per 5s audio chunk
- small: ~2s per 5s audio chunk
With CPU (Intel i7):
- tiny: ~3s per 5s audio chunk
- base: ~8s per 5s audio chunk
- small: ~20s per 5s audio chunk
Adjust the slider to control recording intervals:
- 3-5 seconds: Faster response, may cut off words
- 5-7 seconds: Balanced (recommended)
- 8-10 seconds: Better accuracy, slower response
- Auto: Automatic language detection (recommended)
- Manual: Select specific language for better accuracy
Supported languages include: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Chinese, Arabic, Turkish, Polish, Ukrainian, and 86 more.
The application automatically detects and uses your GPU. To verify:
python -c "import torch; print('CUDA Available:', torch.cuda.is_available())"If False, reinstall PyTorch with CUDA support using the helper script.
Detailed documentation is available in the Documentation folder:
- WHISPER_LOCAL_SETUP.md - Complete setup guide
- WHISPER_SETUP.md - Whisper configuration details
- Check microphone permissions in Windows Settings
- Ensure microphone is set as default recording device
- Try running
test_microphone.pyto diagnose
- Verify you have an NVIDIA GPU
- Install/update NVIDIA drivers
- Reinstall PyTorch with CUDA:
.\Helpers\install_pytorch_gpu_auto.bat
- Check internet connection (first download only)
- Ensure sufficient disk space (models are 40MB-3GB)
- Try a smaller model size
- This has been fixed in the latest version
- Update to the latest code if you see this error
- This is normal - CPU processing is 10-20x slower than GPU
- Consider using a smaller model (tiny or base)
- Upgrade to GPU-enabled PyTorch for better performance
- Check the Documentation folder
- Review Common Issues above
- Open an issue on GitHub with:
- Your system specs (CPU/GPU)
- Python version
- Error messages
- Steps to reproduce
audio-transcriber/
βββ audio_transcriber_whisper_local.py # Main application
βββ requirements.txt # Python dependencies
βββ test_microphone.py # Microphone test utility
βββ README.md # This file
βββ Documentation/ # Detailed guides
β βββ TRANSCRIBER_README.md
β βββ WHISPER_LOCAL_SETUP.md
β βββ WHISPER_SETUP.md
βββ Helpers/ # Installation scripts
βββ install_pytorch_gpu.bat
βββ install_pytorch_gpu_auto.bat
βββ install_whisper_local.bat
- Frontend: Tkinter GUI with threaded audio processing
- Audio Capture: PyAudio with 16kHz sampling rate
- Processing: OpenAI Whisper model with soundfile/scipy
- GPU Acceleration: PyTorch CUDA backend
- Capture - PyAudio records audio in configurable chunks
- Format - Convert to WAV format (16kHz, mono, float32)
- Process - Whisper model transcribes audio
- Display - Results shown with timestamp in GUI
Minimum:
- CPU: Intel i5 or equivalent
- RAM: 4GB
- Storage: 2GB free space
- OS: Windows 10
Recommended:
- CPU: Intel i7 or equivalent
- RAM: 8GB
- GPU: NVIDIA GPU with 4GB+ VRAM
- Storage: 5GB free space
- OS: Windows 11
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI - For the incredible Whisper AI model
- PyTorch Team - For the deep learning framework
- Python Community - For the excellent libraries
If you find this project helpful, please consider:
- β Starring the repository
- π Reporting bugs
- π‘ Suggesting new features
- π Improving documentation
Made with β€οΈ for the open-source community