Skip to content

ylongw/video-whisper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

1 Commit
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸŽ™οΈ Video Whisper

Local video/audio transcription on Apple Silicon using MLX Whisper.

No API keys. No cloud. No cost. Runs entirely on your Mac.

Supports YouTube, Bilibili (Bη«™), Xiaohongshu (小纒书), Douyin (ζŠ–ιŸ³), podcasts, and local files.

Quick Start

# Install dependencies
brew install yt-dlp ffmpeg
python3 -m venv ~/.openclaw/venvs/whisper
~/.openclaw/venvs/whisper/bin/pip install mlx-whisper

# Transcribe
bash scripts/transcribe.sh "https://www.youtube.com/watch?v=..."

Output: /tmp/whisper_output.txt (text) + /tmp/whisper_output.json (with timestamps)

Features

  • 🍎 Apple Silicon optimized β€” MLX framework, fast inference on M1/M2/M3/M4
  • 🌍 Multi-language β€” auto-detects language, strong Chinese/English/Japanese support
  • πŸ“Ί Multi-platform β€” YouTube, Bilibili, Xiaohongshu, Douyin, and 1000+ sites
  • πŸ“ Local files β€” MP4, MP3, WAV, M4A, etc.
  • ⏱️ Timestamps β€” JSON output includes per-segment timing
  • πŸ€– OpenClaw ready β€” drop into skills/ and let your AI agent transcribe & summarize

Performance

On Mac mini M4 (16GB):

Video Length Time (medium model)
5 min ~30-40s
10 min ~60-90s
30 min ~3-4 min
60 min ~6-8 min

See SKILL.md for full docs, model options, and integration guide.

License

MIT

About

πŸŽ™οΈ Local video/audio transcription on Apple Silicon using MLX Whisper. No API keys, no cloud, no cost.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages