Audio Transcriber

Extract text and transcribe audio from PowerPoint presentations, MP4 videos, and MP3 files using Whisper.

Features

Text Extraction: Extracts all text content from PowerPoint slides
Audio Transcription: Uses faster-whisper to transcribe audio from multiple sources:
- PowerPoint files (.pptx) - embedded audio recordings
- Video files (.mp4) - audio track extraction
- Audio files (.mp3) - direct transcription
Checkpoint/Resume Support: Automatically saves progress during long transcriptions
- Resume from where you left off if interrupted
- Checkpoints saved every 10 segments
- Safe to stop with Ctrl+C anytime
Live Progress Tracking: Real-time progress bar and live checkpoint file
- View transcription progress in output/[filename]_checkpoint.json
- See completed segments as they're processed
- Track timestamp and text in real-time
GPU and CPU Support: Automatic device detection with intelligent fallback
Multiple Models: Supports various Whisper model sizes (tiny, base, small, medium, large)
Configurable: Easy-to-modify settings for performance and quality tuning

Requirements

Python 3.8 or higher
ffmpeg (for MP4 video processing)
CUDA-compatible GPU (optional)

Installation

1. Clone or Download the Project

2. Install Python Dependencies

pip install -r requirements.txt

3. Install ffmpeg

Windows:

# Using chocolatey:
choco install ffmpeg

# Or download from: https://ffmpeg.org/download.html

macOS:

brew install ffmpeg

Linux:

sudo apt install ffmpeg

Usage

1. Prepare Your Files

Place your files in the presentations folder:
- PowerPoint presentations (.pptx)
- Video files (.mp4)
- Audio files (.mp3)

2. Run the Transcriber

python main.py

Configuration

Edit the configuration settings at the top of main.py:

Transcription Engine

TRANSCRIPTION_ENGINE = "faster-whisper"  # Options: "standard", "faster-whisper"

Folder Settings

PPTX_FOLDER = "presentations"   # Input folder
OUTPUT_FOLDER = "output"        # Output folder

Whisper Model Settings

WHISPER_MODEL = "small"       # Options: "tiny", "base", "small", "medium", "large"
FORCE_LANGUAGE = "en"         # Force language ("en", "es", "fr", etc.) or None

Performance Settings

FORCE_DEVICE = "cpu"          # Options: None (auto), "cpu", "cuda"
USE_HALF_PRECISION = False    # Enable fp16 for speed boost (GPU only)

Model Size Guide

Model	Speed	Quality	Memory	Best For
tiny	Fastest	Good	~1GB	Quick drafts, testing
base	Fast	Better	~1GB	General use
small	Medium	Good	~2GB	Recommended - best balance
medium	Slow	Very Good	~5GB	High accuracy needs
large	Slowest	Best	~10GB	Maximum quality

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
output		output
presentations		presentations
.gitignore		.gitignore
README.md		README.md
clean_transcripts.py		clean_transcripts.py
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Audio Transcriber

Features

Requirements

Installation

1. Clone or Download the Project

2. Install Python Dependencies

3. Install ffmpeg

Windows:

macOS:

Linux:

Usage

1. Prepare Your Files

2. Run the Transcriber

Configuration

Transcription Engine

Folder Settings

Whisper Model Settings

Performance Settings

Model Size Guide

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Audio Transcriber

Features

Requirements

Installation

1. Clone or Download the Project

2. Install Python Dependencies

3. Install ffmpeg

Windows:

macOS:

Linux:

Usage

1. Prepare Your Files

2. Run the Transcriber

Configuration

Transcription Engine

Folder Settings

Whisper Model Settings

Performance Settings

Model Size Guide

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages