📝 Whisper-Smart — Faster-Whisper CLI

Hold your mouse button, speak, release, and watch text materialise wherever your cursor is.

A lightning-fast desktop dictation utility for Windows 10/11 (Linux & macOS untested) powered by faster-whisper and accelerated by Flash-Attention 2.

✨ Features


Instant hold-to-record	Press the chosen mouse button (default = right) for ≥ 0.2 s, speak, release to paste.
Ultra-low latency	CUDA 12 + Flash-Attention 2 kernels & zero-copy audio pipeline.
Smart VAD	Real-time segmentation with tolerant fallback—no lost syllables.
Context memory	Remembers recent sentences for better proper-noun accuracy.
Clipboard modes	Auto-paste, copy-only, or clipboard-off.
CLI everything	25 + flags: mic gain, device, model size, beam width, VAD aggressiveness …

📥 Quick install

Prerequisites
• Python 3.10 (64-bit) — other versions untested
• NVIDIA GPU + driver ≥ 545 (CUDA 12 runtime)
• ≈ 2 GB free disk space (first model download)

# 1) Clone & enter the repo
git clone https://github.com/future3OOO/Whisper-Smart.git
cd Whisper-Smart

# 2) Create / activate a virtual env
python -m venv .venv
.venv\Scripts\activate           # Windows
# source .venv/bin/activate      # macOS / Linux

# 3) Install runtime deps (PyTorch wheel index embedded)
pip install -r requirements.txt

Developer tooling (pytest, ruff, etc.) lives in requirements-dev.txt.

🚀 Quick start & mic tuning

1 Tune microphone gain

python -m dictation_tool -v --auto-paste --mic-gain 2.5

In verbose logs aim for -25 dBFS … -15 dBFS
• Raise --mic-gain if RMS ≈ -35 dBFS • Lower if RMS ≈ -5 dBFS

2 Choose a performance mode

🎯 Option A — Maximum accuracy (large-v3)

python -m dictation_tool `
       --model large-v3 `
       --auto-paste `
       --mic-gain <your_gain> `
       --vad-aggr 2

Latency ≈ 200-500 ms on an RTX 3080. Designed for 20-30 s dictation bursts.

💨 Option B — Maximum speed (medium.en + prompt tricks)

medium.en delivers ≈ 5-20 ms interface latency while staying surprisingly accurate when paired with a good prompt and a larger beam.

📧 Fast e-mail workflow — preset email

python -m dictation_tool `
       --model medium.en `
       --preset email `
       --beam-size 5 `
       --auto-paste

Example conversation ▶️

🎙️  SPOKEN
-------------------------------------------
hi Steve new paragraph
how's your day going i hope everything is well new paragraph
please email john@gmail.com new paragraph
kind regards new line
john
bullet point first item
bullet point second item

📋  CLIPBOARD
-------------------------------------------
Hi Steve,

How's your day going? I hope everything is well.

Please email john@gmail.com.

Kind regards,
John
• first item
• second item

What happened?

✔ Converts "at sign / dot com" → @gmail.com

✔ Inserts a comma after greeting, a full-stop before blank lines

✔ Normalises sign-offs & capitalises every new line

✔ Recognises bullet point cue (unicode bullet)

Other presets

# Business reports
python -m dictation_tool --model medium.en --preset business --auto-paste

# Programming / code review
python -m dictation_tool --model medium.en --preset programming --auto-paste

Fully custom prompt

python -m dictation_tool `
       --model medium.en `
       --initial-prompt "Support ticket, error code, patch, deployment" `
       --beam-size 5 `
       --auto-paste

Larger beam sizes (e.g. 6-8) are supported but add latency.

🛠 Common customisations

Goal	Flag	Example (PowerShell)
Disable VAD for long monologues	`--no-vad`	... --no-vad --max-buffer-seconds 30
Tolerate longer pauses	`--vad-aggr`	--vad-aggr 1 (0 =tolerant … 3 =strict)
Copy without pasting	`--manual-paste`	... --manual-paste
Change mouse trigger	`--mouse-btn`	--mouse-btn middle
Use hotkey only (no mouse)	`--no-mouse`	... --no-mouse

Full flag list:

python -m dictation_tool --help

🌳 Project layout

dictation_tool/
├─ __main__.py     # CLI entry-point
├─ engine.py       # DictationEngine core
├─ io.py           # Audio + VAD helpers
└─ …               # More modules
 tests/
 docs/

⚙️ System requirements

Component	Requirement
Python	3.10 x64
CUDA runtime	12.1 + (driver ≥ 545)
GPU	NVIDIA RTX (≥ 8 GB VRAM recommended)
OS	Windows 10/11 (Linux/macOS should work)

CPU-only runs are possible but ≈ 5 × slower.

📄 License & contributing

MIT — see LICENSE.
PRs & issues welcome; please run ruff and pytest -n auto before pushing.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dictation_tool		dictation_tool
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
README.md		README.md
mypy.ini		mypy.ini
pyproject.toml		pyproject.toml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
test_clipboard.py		test_clipboard.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📝 Whisper-Smart — Faster-Whisper CLI

✨ Features

📥 Quick install

🚀 Quick start & mic tuning

1 Tune microphone gain

2 Choose a performance mode

🎯 Option A — Maximum accuracy (large-v3)

💨 Option B — Maximum speed (medium.en + prompt tricks)

📧 Fast e-mail workflow — preset email

Other presets

Fully custom prompt

🛠 Common customisations

🌳 Project layout

⚙️ System requirements

📄 License & contributing

About

Uh oh!

Releases

Packages

Languages

future3OOO/Whisper-Smart

Folders and files

Latest commit

History

Repository files navigation

📝 Whisper-Smart — Faster-Whisper CLI

✨ Features

📥 Quick install

🚀 Quick start & mic tuning

1 Tune microphone gain

2 Choose a performance mode

🎯 Option A — Maximum accuracy (large-v3)

💨 Option B — Maximum speed (medium.en + prompt tricks)

📧 Fast e-mail workflow — preset email

Other presets

Fully custom prompt

🛠 Common customisations

🌳 Project layout

⚙️ System requirements

📄 License & contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages