🎙️ Video Whisper

Local video/audio transcription on Apple Silicon using MLX Whisper.

No API keys. No cloud. No cost. Runs entirely on your Mac.

Supports YouTube, Bilibili (B站), Xiaohongshu (小红书), Douyin (抖音), podcasts, and local files.

Quick Start

# Install dependencies
brew install yt-dlp ffmpeg
python3 -m venv ~/.openclaw/venvs/whisper
~/.openclaw/venvs/whisper/bin/pip install mlx-whisper

# Transcribe
bash scripts/transcribe.sh "https://www.youtube.com/watch?v=..."

Output: /tmp/whisper_output.txt (text) + /tmp/whisper_output.json (with timestamps)

Features

🍎 Apple Silicon optimized — MLX framework, fast inference on M1/M2/M3/M4
🌍 Multi-language — auto-detects language, strong Chinese/English/Japanese support
📺 Multi-platform — YouTube, Bilibili, Xiaohongshu, Douyin, and 1000+ sites
📁 Local files — MP4, MP3, WAV, M4A, etc.
⏱️ Timestamps — JSON output includes per-segment timing
🤖 OpenClaw ready — drop into skills/ and let your AI agent transcribe & summarize

Performance

On Mac mini M4 (16GB):

Video Length	Time (medium model)
5 min	~30-40s
10 min	~60-90s
30 min	~3-4 min
60 min	~6-8 min

See SKILL.md for full docs, model options, and integration guide.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
SKILL.md		SKILL.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🎙️ Video Whisper

Quick Start

Features

Performance

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🎙️ Video Whisper

Quick Start

Features

Performance

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages