Local speech loop for microphone or file input:
- record audio
- transcribe with
faster-whisper - synthesize with
piper - play the result locally
- append per-turn benchmark rows to JSONL
- Python 3.10+
- a working
piperexecutable - a local Piper voice model (
.onnx)
faster-whisper does not require a system ffmpeg install.
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .envSet these in .env:
PIPER_BIN=.venv/bin/piperor another valid path to the Piper binaryPIPER_MODEL_PATH=model/voice.onnxor/absolute/path/to/voice.onnx
Optional settings:
PIPER_CONFIG_PATHSAMPLE_RATEFRAME_DURATION_MSMAX_SECONDSSILENCE_THRESHOLDSILENCE_DURATION_SECONDSAUDIO_INPUT_DEVICEBENCHMARK_JSONL
Start the interactive loop:
python -m stt_tts_loopUseful commands:
python -m stt_tts_loop --list-devices
python -m stt_tts_loop --input-audio recordings/sample.wav
python -m stt_tts_loop --compare-stt-models base.en small.en
python -m stt_tts_loop --list-voices
python -m stt_tts_loop --download-voice en_US-libritts-high --voice-download-dir modelExample with explicit overrides:
python -m stt_tts_loop \
--voice-model model/voice.onnx \
--input-device 0 \
--frame-duration-ms 100 \
--max-seconds 10 \
--silence-threshold 0.01 \
--silence-duration 1.0 \
--save-recording recordings- Press Enter to start a turn.
- Recording stops on trailing silence or max duration.
--compare-stt-modelsis STT-only and skips TTS playback.MAX_SECONDSis a per-turn capture limit.- Benchmark rows are appended to
benchmarks/local_runs.jsonlunless overridden.
- No microphone input:
- grant microphone permission to the terminal app
- run
python -m stt_tts_loop --list-devices - retry with
--input-device <index>
- Piper executable not found:
- install
piper-ttsin the active environment or pointPIPER_BINto the binary
- install
- Piper model not found:
- set
PIPER_MODEL_PATHto a valid local.onnxfile
- set
- Empty or near-silent capture:
- increase mic input level
- lower
SILENCE_THRESHOLD - increase
MAX_SECONDS