Skip to content

fizzexual/AudioTranscriber

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🎀 Whisper Local Audio Transcriber

Python License Platform Whisper

Real-time speech-to-text transcription powered by OpenAI's Whisper AI

Features β€’ Installation β€’ Usage β€’ Configuration β€’ Documentation


πŸ“‹ Overview

A powerful, free, and fully local audio transcription application that leverages OpenAI's state-of-the-art Whisper AI model. This tool provides real-time speech-to-text conversion with 95%+ accuracy across 99 languages, running entirely on your machine without requiring API keys or internet connectivity.

Why Choose This Transcriber?

  • πŸ†“ 100% Free - No API costs, no subscriptions, completely free forever
  • πŸ”’ Privacy First - Runs entirely locally, your audio never leaves your computer
  • 🎯 High Accuracy - 95%+ transcription accuracy using Whisper AI
  • 🌍 Multilingual - Supports 99 languages with automatic detection
  • ⚑ GPU Accelerated - Optimized for NVIDIA GPUs (CPU fallback available)
  • 🎨 User-Friendly - Clean, intuitive GUI with real-time feedback

✨ Features

Core Capabilities

  • Real-time Transcription - Continuous audio capture and transcription
  • Multiple Model Sizes - Choose from tiny to large models based on your needs
  • Language Detection - Automatic language identification or manual selection
  • Device Selection - Choose between CPU and GPU processing
  • Adjustable Chunk Duration - Balance between response time and accuracy
  • Microphone Selection - Support for multiple audio input devices

Technical Features

  • GPU Acceleration - CUDA support for NVIDIA GPUs (10-20x faster)
  • Efficient Processing - Optimized audio handling with minimal latency
  • Error Recovery - Robust error handling and automatic recovery
  • Clean UI - Modern Tkinter interface with status indicators
  • Timestamped Output - Each transcription includes timestamp
  • Export Ready - Easy copy/paste of transcribed text

πŸš€ Installation

Prerequisites

  • Operating System: Windows 10/11
  • Python: 3.10 or higher
  • GPU (Optional): NVIDIA GPU with CUDA support for faster processing
  • Microphone: Any working audio input device

Step 1: Clone the Repository

git clone https://github.com/fizzexual/audiotranscriber.git
cd audio-transcriber

Step 2: Install Python Dependencies

pip install -r requirements.txt

Required packages:

  • openai-whisper - Whisper AI model
  • torch - PyTorch framework
  • pyaudio - Audio capture
  • soundfile - Audio file handling
  • scipy - Signal processing
  • numpy - Numerical operations

Step 3: Install PyTorch with GPU Support (Recommended)

For NVIDIA GPU users (10-20x faster):

# Run the automated installer
.\Helpers\install_pytorch_gpu_auto.bat

Or manually:

pip uninstall -y torch torchvision torchaudio
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121

For CPU-only (slower but works on any system):

pip install torch torchvision torchaudio

Step 4: Verify Installation

python test_microphone.py

This will test your microphone and verify all dependencies are correctly installed.


πŸ’» Usage

Quick Start

  1. Launch the application:

    python audio_transcriber_whisper_local.py
  2. Select your model size:

    • tiny - 40MB, fastest, good for quick notes
    • base - 140MB, balanced speed and accuracy (recommended)
    • small - 460MB, better accuracy
    • medium - 1.5GB, high accuracy
    • large - 3GB, best accuracy
  3. Choose your device:

    • auto - Automatically uses GPU if available
    • cpu - Force CPU processing
    • cuda - Force GPU processing
  4. Click "Load Model" - First time will download the model

  5. Select your microphone from the dropdown

  6. Click "Start Listening" and begin speaking

  7. Watch real-time transcription appear in the text area

Model Selection Guide

Model Size Speed Accuracy Best For
tiny 40MB Very Fast Good Quick notes, testing
base 140MB Fast Good General use (recommended)
small 460MB Medium Better Professional transcription
medium 1.5GB Slow High High-accuracy needs
large 3GB Very Slow Best Maximum accuracy required

Performance Benchmarks

With NVIDIA RTX 4060 GPU:

  • tiny: ~0.5s per 5s audio chunk
  • base: ~1s per 5s audio chunk
  • small: ~2s per 5s audio chunk

With CPU (Intel i7):

  • tiny: ~3s per 5s audio chunk
  • base: ~8s per 5s audio chunk
  • small: ~20s per 5s audio chunk

βš™οΈ Configuration

Chunk Duration

Adjust the slider to control recording intervals:

  • 3-5 seconds: Faster response, may cut off words
  • 5-7 seconds: Balanced (recommended)
  • 8-10 seconds: Better accuracy, slower response

Language Settings

  • Auto: Automatic language detection (recommended)
  • Manual: Select specific language for better accuracy

Supported languages include: English, Spanish, French, German, Italian, Portuguese, Russian, Japanese, Chinese, Arabic, Turkish, Polish, Ukrainian, and 86 more.

GPU Configuration

The application automatically detects and uses your GPU. To verify:

python -c "import torch; print('CUDA Available:', torch.cuda.is_available())"

If False, reinstall PyTorch with CUDA support using the helper script.


πŸ“š Documentation

Detailed documentation is available in the Documentation folder:


πŸ› οΈ Troubleshooting

Common Issues

"No microphone detected"

  • Check microphone permissions in Windows Settings
  • Ensure microphone is set as default recording device
  • Try running test_microphone.py to diagnose

"CUDA not available" (GPU not detected)

  • Verify you have an NVIDIA GPU
  • Install/update NVIDIA drivers
  • Reinstall PyTorch with CUDA: .\Helpers\install_pytorch_gpu_auto.bat

"Model loading failed"

  • Check internet connection (first download only)
  • Ensure sufficient disk space (models are 40MB-3GB)
  • Try a smaller model size

"Transcription error: [WinError 2]"

  • This has been fixed in the latest version
  • Update to the latest code if you see this error

Slow transcription on CPU

  • This is normal - CPU processing is 10-20x slower than GPU
  • Consider using a smaller model (tiny or base)
  • Upgrade to GPU-enabled PyTorch for better performance

Getting Help

  1. Check the Documentation folder
  2. Review Common Issues above
  3. Open an issue on GitHub with:
    • Your system specs (CPU/GPU)
    • Python version
    • Error messages
    • Steps to reproduce

πŸ—οΈ Project Structure

audio-transcriber/
β”œβ”€β”€ audio_transcriber_whisper_local.py  # Main application
β”œβ”€β”€ requirements.txt                     # Python dependencies
β”œβ”€β”€ test_microphone.py                   # Microphone test utility
β”œβ”€β”€ README.md                            # This file
β”œβ”€β”€ Documentation/                       # Detailed guides
β”‚   β”œβ”€β”€ TRANSCRIBER_README.md
β”‚   β”œβ”€β”€ WHISPER_LOCAL_SETUP.md
β”‚   └── WHISPER_SETUP.md
└── Helpers/                             # Installation scripts
    β”œβ”€β”€ install_pytorch_gpu.bat
    β”œβ”€β”€ install_pytorch_gpu_auto.bat
    └── install_whisper_local.bat

πŸ”§ Technical Details

Architecture

  • Frontend: Tkinter GUI with threaded audio processing
  • Audio Capture: PyAudio with 16kHz sampling rate
  • Processing: OpenAI Whisper model with soundfile/scipy
  • GPU Acceleration: PyTorch CUDA backend

Audio Pipeline

  1. Capture - PyAudio records audio in configurable chunks
  2. Format - Convert to WAV format (16kHz, mono, float32)
  3. Process - Whisper model transcribes audio
  4. Display - Results shown with timestamp in GUI

System Requirements

Minimum:

  • CPU: Intel i5 or equivalent
  • RAM: 4GB
  • Storage: 2GB free space
  • OS: Windows 10

Recommended:

  • CPU: Intel i7 or equivalent
  • RAM: 8GB
  • GPU: NVIDIA GPU with 4GB+ VRAM
  • Storage: 5GB free space
  • OS: Windows 11

🀝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.

Development Setup

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • OpenAI - For the incredible Whisper AI model
  • PyTorch Team - For the deep learning framework
  • Python Community - For the excellent libraries

πŸ“ž Support

If you find this project helpful, please consider:

  • ⭐ Starring the repository
  • πŸ› Reporting bugs
  • πŸ’‘ Suggesting new features
  • πŸ“– Improving documentation

Made with ❀️ for the open-source community

Report Bug β€’ Request Feature

About

A powerful, free, and fully local audio transcription application that leverages OpenAI's state-of-the-art Whisper AI model. This tool provides real-time speech-to-text conversion with 95%+ accuracy across 99 languages, running entirely on your machine without requiring API keys or internet connectivity.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors