ASR-Kit

Modular Python library for Automatic Speech Recognition. Pass WAV(s) to a unified Transcriber; get TranscriptionResult(s) back. Backed by pluggable model drivers.

Supported: Qwen3-ASR | Whisper (planned) | Parakeet (planned)

Setup

# 1. Create and activate a dedicated conda environment (Python 3.12 recommended)
conda create -n asr-kit python=3.12
conda activate asr-kit

# 2. Install directly from GitHub
pip install "asr-kit[qwen] @ git+https://github.com/dqvid3/asr-kit.git"

# 3. (Optional - Qwen only) Install Flash Attention 2 for Ampere+ GPUs (RTX 3090/4090/ADA/A100)
# Significantly reduces VRAM and speeds up long audio processing when using the Qwen model.
# If your machine has less than 96GB of RAM and lots of CPU cores, run:
# MAX_JOBS=4 pip install -U flash-attn --no-build-isolation
pip install -U flash-attn --no-build-isolation

# To remove the environment later
conda env remove -n asr-kit

Note: Python 3.13 may have compatibility issues with torch or qwen-asr. Use 3.12 to be safe.

Usage

from asr_kit import Transcriber

t = Transcriber(model="qwen", device="cuda")

# Single file → TranscriptionResult
result = t.transcribe("audio.wav")
print(result.text)

# Multiple files → list[TranscriptionResult]
results = t.transcribe(["a.wav", "b.wav"], language="English")

# With word-level timestamps (Qwen: requires use_forced_aligner=True)
t = Transcriber(model="qwen", device="cuda", use_forced_aligner=True)
result = t.transcribe("audio.wav", return_timestamps=True)
for word in result.timestamps:
    print(f"{word.start:.2f}s  {word.text}")

Output: TranscriptionResult

field	type	description
`text`	`str`	Full transcript
`language`	`str \| None`	Detected language code
`timestamps`	`list[WordTimestamp] \| None`	Word-level timing (if requested)
`audio_path`	`str`	Absolute source WAV path
`model`	`str`	Model identifier

WordTimestamp: text: str, start: float, end: float (seconds)

Adding a Driver

Create src/asr_kit/drivers/my_driver.py implementing BaseDriver
Implement load_model(**kwargs) -> None and transcribe(audio_paths, **kwargs) -> list[TranscriptionResult]
Register the key in _DRIVER_REGISTRY dict in transcriber.py
Add optional deps under [project.optional-dependencies] in pyproject.toml

Try It

After cloning the repo:

python examples/qwen_basic.py audio.wav
python examples/qwen_basic.py audio.wav --timestamps
python examples/qwen_basic.py a.wav b.wav --language English

Issues and PRs welcome.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
examples		examples
src/asr_kit		src/asr_kit
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ASR-Kit

Setup

Usage

Output: TranscriptionResult

Adding a Driver

Try It

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

ASR-Kit

Setup

Usage

Output: TranscriptionResult

Adding a Driver

Try It

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages