A Python script that downloads and transcribes videos from YouTube channels using OpenAI's Whisper. The script first attempts to use YouTube's built-in transcription API, and falls back to Whisper if no transcript is available.
- Fetches all playlists from a YouTube channel matching specified keywords
- Downloads and processes videos in batches
- Uses YouTube's transcript API when available
- Falls back to OpenAI's Whisper for videos without transcripts
- Stores transcripts in SQLite database
- Shows progress with tqdm progress bars
- Python 3.7+
- ffmpeg (required for Whisper)
- Chrome/Chromium browser (for Selenium)
- YouTube Data API key
- Clone this repository
- Install required packages:
pip install -r requirements.txt-
Install ffmpeg (if not already installed):
- Ubuntu:
sudo apt install ffmpeg - macOS:
brew install ffmpeg - Windows: Download from ffmpeg website
- Ubuntu:
-
Copy
config.example.jsontoconfig.jsonand update with your settings:- Get a YouTube API key from Google Cloud Console
- Set your target channel URL
- Define keywords to match playlists
- Configure your settings in
config.json - Run the script:
python main.pyEdit config.json with your settings:
youtube_api_key: Your YouTube Data API keychannel_url: URL of the YouTube channel to processplaylist_keywords: List of keywords to match playlistswhisper_model: Whisper model to use (tiny, base, small, medium, large)
MIT License