Automatically transcribe Zoom mp4 videos to Google Docs using local Whisper AI with intelligent calendar integration and speaker identification. Zero ongoing costs - all processing happens on your machine.
- π₯ Batch Processing - Transcribe multiple videos automatically
- π Calendar Integration - Automatically fetches meeting titles and attendees from Google Calendar
- π₯ Speaker Identification - Uses calendar attendees for accurate speaker names
- π° Zero Cost - Uses local Whisper AI (no API charges)
- π Google Meet Format - Creates properly formatted transcripts matching Google Meet style
- π Progress Tracking - Rich terminal UI with progress bars
- π― Model Selection - Choose speed vs accuracy trade-off
- π€ AI Speaker Diarization - Optional advanced speaker detection with pyannote.audio
-
Python 3.9+
-
ffmpeg - Required for audio processing
# macOS brew install ffmpeg # Ubuntu/Debian sudo apt install ffmpeg
-
Google Cloud Project - For API access
- Create project at https://console.cloud.google.com
- Enable Google Docs API, Drive API, and Calendar API
- Download OAuth 2.0 credentials as
credentials.json
# Clone the repository
git clone https://github.com/stephenhsklarew/Zoom2GoogleTranscript.git
cd Zoom2GoogleTranscript
# Install dependencies
pip install -r requirements.txt
# Authenticate with Google
python authenticate.py# Transcribe all videos in a folder
python video_transcriber.py /path/to/zoom/recordings
# Use a specific model
python video_transcriber.py /path/to/zoom/recordings --model medium
# Specify credentials
python video_transcriber.py /path/to/zoom/recordings --credentials token_video.pickleThe tool creates Google Docs transcripts in Google Meet format:
Dec 9, 2024
Steve/Karan/Stephen - Transcript
Attendees: karan.apatel, stephen.sklarew, steve.burden
00:00:00
karan.apatel: Hey everyone, thanks for joining...
stephen.sklarew: Great to be here. Let's discuss...
steve.burden: I'll start with the quarterly results...
- Extracts date/time from Zoom folder names (format:
YYYY-MM-DD HH.MM.SS Meeting Name) - Queries Google Calendar for matching events (Β±30 minute window)
- Extracts meeting details - title and attendee list
- Transcribes audio using Whisper AI
- Maps speakers to calendar attendees
- Creates formatted Google Doc with proper attribution
Uses pause detection (>2 seconds) combined with calendar attendee names:
- β Zero setup required
- β Works immediately with calendar integration
- β Good for 2-3 person conversations
β οΈ Less accurate for complex multi-speaker scenarios
For advanced speaker detection with pyannote.audio:
-
Get Hugging Face Token:
- Create account at https://huggingface.co
- Accept agreement at https://huggingface.co/pyannote/speaker-diarization-3.1
- Get token from https://huggingface.co/settings/tokens
-
Use with token:
# Via environment variable (recommended) export HF_TOKEN=hf_your_token_here python video_transcriber.py /path/to/videos # Or via command line python video_transcriber.py /path/to/videos --hf-token hf_your_token_here
Benefits of AI Diarization:
- π― More accurate speaker detection
- π₯ Better for 3+ person meetings
- π Analyzes voice characteristics, not just pauses
- β Still free (runs locally)
python video_transcriber.py <video_folder> [OPTIONS]
Required:
video_folder Path to folder containing Zoom recordings
Optional:
--model MODEL Whisper model: tiny, base, small, medium, large
(default: base)
--no-recursive Don't search subdirectories
--folder-id ID Google Drive folder ID to save documents
--credentials PATH Path to Google credentials file
(default: token_video.pickle)
--hf-token TOKEN Hugging Face token for speaker diarization
(can also use HF_TOKEN environment variable)
--since DATE Only process videos modified after this date
Format: YYYY-MM-DD or YYYY-MM-DD HH:MM:SS
Examples: 2024-12-01 or "2024-12-01 14:30:00"| Model | Speed | Accuracy | RAM | Download Size |
|---|---|---|---|---|
| tiny | ~32x realtime | Lowest | 1GB | ~75MB |
| base | ~16x realtime | Good β | 1GB | ~140MB |
| small | ~6x realtime | Better | 2GB | ~460MB |
| medium | ~2x realtime | High | 5GB | ~1.5GB |
| large | ~1x realtime | Best | 10GB | ~3GB |
Recommendation: Start with base model for speed/quality balance.
MacBook Pro M1 (CPU only):
- 30 min video with
basemodel: ~2 minutes - 30 min video with
mediummodel: ~4 minutes
- Go to https://console.cloud.google.com
- Create a new project
- Enable these APIs:
- Google Docs API
- Google Drive API
- Google Calendar API (v3)
- Create OAuth 2.0 credentials:
- Application type: "Desktop app"
- Download as
credentials.json - Place in project directory
Run the authentication script once:
python authenticate.pyThis will:
- Open your browser for Google OAuth
- Request permissions for Docs, Drive, and Calendar
- Save credentials to
token_video.pickle
The token is reused for all future transcriptions.
For calendar integration to work, organize recordings in Zoom's default format:
Zoom/
βββ 2024-12-09 10.31.25 Steve_Karan_Stephen/
β βββ video1487928882.mp4
βββ 2024-12-02 15.00.18 Diane_Stephen Weekly 1_1/
β βββ video1683623283.mp4
βββ ...
The folder name format YYYY-MM-DD HH.MM.SS Meeting Name is used to match calendar events.
# Transcribe all recordings from last week
python video_transcriber.py ~/Documents/Zoom --model base
# Review transcripts in Google Docs
# Speaker names automatically pulled from calendar# Process all client recordings with better accuracy
python video_transcriber.py ~/Videos/ClientCalls \
--model medium \
--folder-id abc123xyz
# All transcripts organized in specific Drive folder# Use AI speaker diarization for multi-speaker panel
export HF_TOKEN=hf_your_token
python video_transcriber.py ~/Conferences/2024 \
--model medium \
--recursive# Only process videos from this week
python video_transcriber.py ~/Documents/Zoom --since 2024-12-01
# Process videos from a specific date and time
python video_transcriber.py ~/Documents/Zoom --since "2024-12-01 14:30:00"
# Useful for daily/weekly automation - only transcribe new recordings
python video_transcriber.py ~/Documents/Zoom --since $(date -v-7d +%Y-%m-%d)- β All AI processing is local - Videos never sent to external servers
- β No OpenAI API calls - Zero data sent to cloud
- β Google OAuth - Secure authentication flow
- β Minimal permissions - Only Docs/Drive/Calendar access
- β Token stored locally - credentials.json and token.pickle stay on your machine
brew install ffmpeg # macOS
sudo apt install ffmpeg # Ubuntu/DebianEnable Calendar API in Google Cloud Console: https://console.cloud.google.com/apis/library/calendar-json.googleapis.com
Check that:
- Video folder follows Zoom naming:
YYYY-MM-DD HH.MM.SS Meeting Name - Calendar event exists within Β±30 minutes of recording time
- Calendar API is enabled and authenticated
- Use smaller model (
--model baseor--model tiny) - Enable GPU if available (automatic)
- Process overnight for large batches
- Verify calendar event has attendees listed
- Try AI diarization with
--hf-tokenfor better accuracy - Check that Zoom folder timestamp matches meeting time
| Solution | Cost | Processing Speed | Accuracy |
|---|---|---|---|
| This Tool (Zoom2GoogleTranscript) | $0 | Local (2-10x realtime) | High |
| Whisper API | $0.006/min | Very Fast | High |
| Google Speech-to-Text | $0.016/min | Very Fast | Medium |
| Rev.ai | $1.50/min | Fast | Very High |
100 hours of video:
- Zoom2GoogleTranscript: $0
- Whisper API: $36
- Google Speech-to-Text: $96
- Rev.ai: $9,000
Contributions welcome! Areas for improvement:
- Support for additional video formats (mov, avi, webm)
- Parallel processing for faster batch jobs
- Custom speaker name mapping
- Integration with other calendar systems
- Improved speaker diarization algorithms
MIT License - Free for personal and commercial use.
Stephen Sklarew (@stephenhsklarew)
- OpenAI Whisper - Speech recognition model
- pyannote.audio - Speaker diarization
- Google APIs - Docs, Drive, Calendar integration
For issues or questions:
- Open an issue on GitHub
- Email: stephen@synaptiq.ai