Skip to content

cnjack/qwen3-asr-cli

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qwen3-ASR-CLI

A command-line tool for speech-to-text transcription using Qwen3-ASR model.

Features

  • 🎤 High-quality speech recognition powered by Qwen3-ASR
  • 🚀 Supports CUDA, MPS (Apple Silicon), and CPU
  • 📝 Clean text output, perfect for piping and scripting
  • 🔌 Works as a CLI provider in OpenClaw

Installation

Prerequisites

  • Python 3.12+
  • uv package manager

Quick Install (Recommended)

Install directly from GitHub:

uv tool install git+https://github.com/cnjack/qwen3-asr-cli.git

Install from Source (for development)

  1. Clone the repository:
git clone https://github.com/cnjack/qwen3-asr-cli.git
cd qwen3-asr-cli
  1. Install globally:
uv tool install -e .

(Optional) Download Models Locally

By default, the tool uses Qwen/Qwen3-ASR-1.7B and will download it automatically from Hugging Face on first use. If you prefer to download models manually or need offline access, use one of the following methods:

Download through ModelScope (recommended for users in Mainland China):

pip install -U modelscope
modelscope download --model Qwen/Qwen3-ASR-1.7B --local_dir ./Qwen3-ASR-1.7B
modelscope download --model Qwen/Qwen3-ASR-0.6B --local_dir ./Qwen3-ASR-0.6B
modelscope download --model Qwen/Qwen3-ForcedAligner-0.6B --local_dir ./Qwen3-ForcedAligner-0.6B

Download through Hugging Face:

pip install -U "huggingface_hub[cli]"
huggingface-cli download Qwen/Qwen3-ASR-1.7B --local-dir ./Qwen3-ASR-1.7B
huggingface-cli download Qwen/Qwen3-ASR-0.6B --local-dir ./Qwen3-ASR-0.6B
huggingface-cli download Qwen/Qwen3-ForcedAligner-0.6B --local-dir ./Qwen3-ForcedAligner-0.6B

Note: If you download models locally, you'll need to specify the model path using the --model parameter when running the CLI.

Usage

qwen3-asr-cli <audio_file> [--model MODEL_NAME_OR_PATH]

Options

  • audio_file: Path to the audio file to transcribe (required)
  • --model: Model name or local path (default: Qwen/Qwen3-ASR-1.7B)
    • Use official model name: Qwen/Qwen3-ASR-1.7B, Qwen/Qwen3-ASR-0.6B
    • Use local path: ./Qwen3-ASR-1.7B, ./Qwen3-ASR-0.6B

Examples

# Transcribe using default model (Qwen/Qwen3-ASR-1.7B)
qwen3-asr-cli recording.mp3

# Transcribe using a different official model
qwen3-asr-cli recording.mp3 --model Qwen/Qwen3-ASR-0.6B

# Transcribe using a locally downloaded model
qwen3-asr-cli meeting.wav --model ./Qwen3-ASR-1.7B

# Transcribe and save to file
qwen3-asr-cli meeting.wav > transcript.txt

# Use with other commands
qwen3-asr-cli audio.mp3 | wc -w  # Count words

OpenClaw Integration

You can use qwen3-asr-cli as a CLI audio transcription provider in OpenClaw.

Add the following to your OpenClaw configuration:

{
  "tools": {
    "media": {
      "audio": {
        "enabled": true,
        "models": [
          {
            "type": "cli",
            "command": "qwen3-asr-cli",
            "args": ["{{MediaPath}}"],
            "timeoutSeconds": 120
          }
        ]
      }
    }
  }
}

To use a different model or a locally downloaded model, add the --model parameter:

{
  "type": "cli",
  "command": "qwen3-asr-cli",
  "args": ["{{MediaPath}}", "--model", "Qwen/Qwen3-ASR-0.6B"],
  "timeoutSeconds": 120
}

This allows OpenClaw to automatically transcribe voice messages and audio attachments using Qwen3-ASR locally, without relying on external API providers.

Supported Audio Formats

All formats supported by librosa:

  • MP3, WAV, FLAC, OGG, M4A, and more

License

MIT

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published