A command-line tool for speech-to-text transcription using Qwen3-ASR model.
- 🎤 High-quality speech recognition powered by Qwen3-ASR
- 🚀 Supports CUDA, MPS (Apple Silicon), and CPU
- 📝 Clean text output, perfect for piping and scripting
- 🔌 Works as a CLI provider in OpenClaw
- Python 3.12+
- uv package manager
Install directly from GitHub:
uv tool install git+https://github.com/cnjack/qwen3-asr-cli.git- Clone the repository:
git clone https://github.com/cnjack/qwen3-asr-cli.git
cd qwen3-asr-cli- Install globally:
uv tool install -e .By default, the tool uses Qwen/Qwen3-ASR-1.7B and will download it automatically from Hugging Face on first use. If you prefer to download models manually or need offline access, use one of the following methods:
Download through ModelScope (recommended for users in Mainland China):
pip install -U modelscope
modelscope download --model Qwen/Qwen3-ASR-1.7B --local_dir ./Qwen3-ASR-1.7B
modelscope download --model Qwen/Qwen3-ASR-0.6B --local_dir ./Qwen3-ASR-0.6B
modelscope download --model Qwen/Qwen3-ForcedAligner-0.6B --local_dir ./Qwen3-ForcedAligner-0.6BDownload through Hugging Face:
pip install -U "huggingface_hub[cli]"
huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir ./Qwen3-ASR-1.7B
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir ./Qwen3-ASR-0.6B
huggingface-cli download Qwen/Qwen3-ForcedAligner-0.6B --local-dir ./Qwen3-ForcedAligner-0.6BNote: If you download models locally, you'll need to specify the model path using the --model parameter when running the CLI.
qwen3-asr-cli <audio_file> [--model MODEL_NAME_OR_PATH]audio_file: Path to the audio file to transcribe (required)--model: Model name or local path (default:Qwen/Qwen3-ASR-1.7B)- Use official model name:
Qwen/Qwen3-ASR-1.7B,Qwen/Qwen3-ASR-0.6B - Use local path:
./Qwen3-ASR-1.7B,./Qwen3-ASR-0.6B
- Use official model name:
# Transcribe using default model (Qwen/Qwen3-ASR-1.7B)
qwen3-asr-cli recording.mp3
# Transcribe using a different official model
qwen3-asr-cli recording.mp3 --model Qwen/Qwen3-ASR-0.6B
# Transcribe using a locally downloaded model
qwen3-asr-cli meeting.wav --model ./Qwen3-ASR-1.7B
# Transcribe and save to file
qwen3-asr-cli meeting.wav > transcript.txt
# Use with other commands
qwen3-asr-cli audio.mp3 | wc -w # Count wordsYou can use qwen3-asr-cli as a CLI audio transcription provider in OpenClaw.
Add the following to your OpenClaw configuration:
{
"tools": {
"media": {
"audio": {
"enabled": true,
"models": [
{
"type": "cli",
"command": "qwen3-asr-cli",
"args": ["{{MediaPath}}"],
"timeoutSeconds": 120
}
]
}
}
}
}To use a different model or a locally downloaded model, add the --model parameter:
{
"type": "cli",
"command": "qwen3-asr-cli",
"args": ["{{MediaPath}}", "--model", "Qwen/Qwen3-ASR-0.6B"],
"timeoutSeconds": 120
}This allows OpenClaw to automatically transcribe voice messages and audio attachments using Qwen3-ASR locally, without relying on external API providers.
All formats supported by librosa:
- MP3, WAV, FLAC, OGG, M4A, and more
MIT