🎤 Whisper Local Audio Transcriber

Real-time speech-to-text transcription powered by OpenAI's Whisper AI

Features • Installation • Usage • Configuration • Documentation

📋 Overview

A powerful, free, and fully local audio transcription application that leverages OpenAI's state-of-the-art Whisper AI model. This tool provides real-time speech-to-text conversion with 95%+ accuracy across 99 languages, running entirely on your machine without requiring API keys or internet connectivity.

Why Choose This Transcriber?

🆓 100% Free - No API costs, no subscriptions, completely free forever
🔒 Privacy First - Runs entirely locally, your audio never leaves your computer
🎯 High Accuracy - 95%+ transcription accuracy using Whisper AI
🌍 Multilingual - Supports 99 languages with automatic detection
⚡ GPU Accelerated - Optimized for NVIDIA GPUs (CPU fallback available)
🎨 User-Friendly - Clean, intuitive GUI with real-time feedback

✨ Features

Core Capabilities

Real-time Transcription - Continuous audio capture and transcription
Multiple Model Sizes - Choose from tiny to large models based on your needs
Language Detection - Automatic language identification or manual selection
Device Selection - Choose between CPU and GPU processing
Adjustable Chunk Duration - Balance between response time and accuracy
Microphone Selection - Support for multiple audio input devices

Technical Features

GPU Acceleration - CUDA support for NVIDIA GPUs (10-20x faster)
Efficient Processing - Optimized audio handling with minimal latency
Error Recovery - Robust error handling and automatic recovery
Clean UI - Modern Tkinter interface with status indicators
Timestamped Output - Each transcription includes timestamp
Export Ready - Easy copy/paste of transcribed text

🚀 Installation

Prerequisites

Operating System: Windows 10/11
Python: 3.10 or higher
GPU (Optional): NVIDIA GPU with CUDA support for faster processing
Microphone: Any working audio input device

Step 1: Clone the Repository

git clone https://github.com/fizzexual/audiotranscriber.git
cd audio-transcriber

Step 2: Install Python Dependencies

pip install -r requirements.txt

Required packages:

openai-whisper - Whisper AI model
torch - PyTorch framework
pyaudio - Audio capture
soundfile - Audio file handling
scipy - Signal processing
numpy - Numerical operations

Step 3: Install PyTorch with GPU Support (Recommended)

For NVIDIA GPU users (10-20x faster):

# Run the automated installer
.\Helpers\install_pytorch_gpu_auto.bat

Or manually:

pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

For CPU-only (slower but works on any system):

pip install torch torchvision torchaudio

Step 4: Verify Installation

python test_microphone.py

This will test your microphone and verify all dependencies are correctly installed.

💻 Usage

Quick Start

Launch the application:

python audio_transcriber_whisper_local.py

Select your model size:
- tiny - 40MB, fastest, good for quick notes
- base - 140MB, balanced speed and accuracy (recommended)
- small - 460MB, better accuracy
- medium - 1.5GB, high accuracy
- large - 3GB, best accuracy
Choose your device:
- auto - Automatically uses GPU if available
- cpu - Force CPU processing
- cuda - Force GPU processing
Click "Load Model" - First time will download the model
Select your microphone from the dropdown
Click "Start Listening" and begin speaking
Watch real-time transcription appear in the text area

Model Selection Guide

Model	Size	Speed	Accuracy	Best For
tiny	40MB	Very Fast	Good	Quick notes, testing
base	140MB	Fast	Good	General use (recommended)
small	460MB	Medium	Better	Professional transcription
medium	1.5GB	Slow	High	High-accuracy needs
large	3GB	Very Slow	Best	Maximum accuracy required

Performance Benchmarks

With NVIDIA RTX 4060 GPU:

tiny: ~0.5s per 5s audio chunk
base: ~1s per 5s audio chunk
small: ~2s per 5s audio chunk

With CPU (Intel i7):

tiny: ~3s per 5s audio chunk
base: ~8s per 5s audio chunk
small: ~20s per 5s audio chunk

⚙️ Configuration

Chunk Duration

Adjust the slider to control recording intervals:

3-5 seconds: Faster response, may cut off words
5-7 seconds: Balanced (recommended)
8-10 seconds: Better accuracy, slower response

Language Settings

Auto: Automatic language detection (recommended)
Manual: Select specific language for better accuracy

Supported languages include: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Chinese, Arabic, Turkish, Polish, Ukrainian, and 86 more.

GPU Configuration

The application automatically detects and uses your GPU. To verify:

python -c "import torch; print('CUDA Available:', torch.cuda.is_available())"

If False, reinstall PyTorch with CUDA support using the helper script.

📚 Documentation

Detailed documentation is available in the Documentation folder:

WHISPER_LOCAL_SETUP.md - Complete setup guide
WHISPER_SETUP.md - Whisper configuration details

🛠️ Troubleshooting

Common Issues

"No microphone detected"

Check microphone permissions in Windows Settings
Ensure microphone is set as default recording device
Try running test_microphone.py to diagnose

"CUDA not available" (GPU not detected)

Verify you have an NVIDIA GPU
Install/update NVIDIA drivers
Reinstall PyTorch with CUDA: .\Helpers\install_pytorch_gpu_auto.bat

"Model loading failed"

Check internet connection (first download only)
Ensure sufficient disk space (models are 40MB-3GB)
Try a smaller model size

"Transcription error: [WinError 2]"

This has been fixed in the latest version
Update to the latest code if you see this error

Slow transcription on CPU

This is normal - CPU processing is 10-20x slower than GPU
Consider using a smaller model (tiny or base)
Upgrade to GPU-enabled PyTorch for better performance

Getting Help

Check the Documentation folder
Review Common Issues above
Open an issue on GitHub with:
- Your system specs (CPU/GPU)
- Python version
- Error messages
- Steps to reproduce

🏗️ Project Structure

audio-transcriber/
├── audio_transcriber_whisper_local.py  # Main application
├── requirements.txt                     # Python dependencies
├── test_microphone.py                   # Microphone test utility
├── README.md                            # This file
├── Documentation/                       # Detailed guides
│   ├── TRANSCRIBER_README.md
│   ├── WHISPER_LOCAL_SETUP.md
│   └── WHISPER_SETUP.md
└── Helpers/                             # Installation scripts
    ├── install_pytorch_gpu.bat
    ├── install_pytorch_gpu_auto.bat
    └── install_whisper_local.bat

🔧 Technical Details

Architecture

Frontend: Tkinter GUI with threaded audio processing
Audio Capture: PyAudio with 16kHz sampling rate
Processing: OpenAI Whisper model with soundfile/scipy
GPU Acceleration: PyTorch CUDA backend

Audio Pipeline

Capture - PyAudio records audio in configurable chunks
Format - Convert to WAV format (16kHz, mono, float32)
Process - Whisper model transcribes audio
Display - Results shown with timestamp in GUI

System Requirements

Minimum:

CPU: Intel i5 or equivalent
RAM: 4GB
Storage: 2GB free space
OS: Windows 10

Recommended:

CPU: Intel i7 or equivalent
RAM: 8GB
GPU: NVIDIA GPU with 4GB+ VRAM
Storage: 5GB free space
OS: Windows 11

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

Fork the repository
Create your feature branch (git checkout -b feature/AmazingFeature)
Commit your changes (git commit -m 'Add some AmazingFeature')
Push to the branch (git push origin feature/AmazingFeature)
Open a Pull Request

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

OpenAI - For the incredible Whisper AI model
PyTorch Team - For the deep learning framework
Python Community - For the excellent libraries

📞 Support

If you find this project helpful, please consider:

⭐ Starring the repository
🐛 Reporting bugs
💡 Suggesting new features
📖 Improving documentation

Made with ❤️ for the open-source community

Report Bug • Request Feature

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.vscode		.vscode
Documentation		Documentation
Helpers		Helpers
README.md		README.md
audio_transcriber_whisper_local.py		audio_transcriber_whisper_local.py
requirements.txt		requirements.txt
test_microphone.py		test_microphone.py

Folders and files

Latest commit

History

Repository files navigation

🎤 Whisper Local Audio Transcriber

📋 Overview

Why Choose This Transcriber?

✨ Features

Core Capabilities

Technical Features

🚀 Installation

Prerequisites

Step 1: Clone the Repository

Step 2: Install Python Dependencies

Step 3: Install PyTorch with GPU Support (Recommended)

Step 4: Verify Installation

💻 Usage

Quick Start

Model Selection Guide

Performance Benchmarks

⚙️ Configuration

Chunk Duration

Language Settings

GPU Configuration

📚 Documentation

🛠️ Troubleshooting

Common Issues

"No microphone detected"

"CUDA not available" (GPU not detected)

"Model loading failed"

"Transcription error: [WinError 2]"

Slow transcription on CPU

Getting Help

🏗️ Project Structure

🔧 Technical Details

Architecture

Audio Pipeline

System Requirements

🤝 Contributing

Development Setup

📄 License

🙏 Acknowledgments

📞 Support

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages