Forensic-grade conversation analyzer & speaker identity workbench VocalTrace is a GUI-driven tool designed for investigators and researchers to transform raw audio into verified, searchable intelligence. It combines state-of-the-art diarization (Pyannote) and transcription (Whisper) with a robust "Human-in-the-Loop" verification workflow.
Intended audience
- Investigators / analysts working with recorded conversations
- Researchers validating diarization & transcription results
- Developers exploring human-verified AI pipelines
- Decoupled Cleaning Pipeline: Clean your audio once as an explicit first step, ensuring consistent results across transcription, diarization, and the Snipper.
- Truth Persistence: Manually corrected segments (Ground Truth) are locked—AI re-runs will never overwrite your manual work.
- Enhanced Windows Stability: Removed Conda FFmpeg dependencies to eliminate DLL conflicts with PySide6/Qt.
- Audio Snipper Workbench: A precision tool for splitting, merging, and "voice printing" speakers with surgical accuracy.
Segmented, speaker-labeled transcript with waveform-linked playback (“sync-on-click”).
LLM summary, psychological profile, themes
Manage known speakers and biometric signatures
Lets you chat with the transcript using RAG + LLM
Precision audio labeling tool + commit corrected segments to Voice Bank
- Speaker diarization (Pyannote) with progress feedback
- Transcription (Whisper / Transformers pipeline)
- Audio cleaning via
AudioDenoiser(noise reduction + filters + normalization) - Voice bank / biometrics utilities (experimental)
- LLM analysis (OpenAI / Gemini) and optional RAG tooling
- Ground truth persistence – manually verified segments are locked and never overwritten by re-runs
git clone https://github.com/Rakile/VocalTrace.git
cd VocalTraceUse either venv or Conda.
venv (recommended for GUI stability on Windows)
python3.12 -m venv .venv
# Windows
.venv\Scripts\activate
# Linux/macOS
source .venv/bin/activateConda (OK, but do NOT install conda ffmpeg)
conda create -n vocaltrace python=3.12
conda activate vocaltracePick the command that matches your platform/CUDA. See the official selector at pytorch.org.
Example (CUDA 12.8 wheels):
pip install torch==2.8.0 torchaudio==2.8.0 --index-url https://download.pytorch.org/whl/cu128pip install -r requirements.txtVocalTrace uses an ffmpeg executable to decode audio for diarization/transcription when needed.
- Windows: install ffmpeg and ensure
ffmpeg.exeis on PATH (or setVOCALTRACE_FFMPEG) - Linux:
sudo apt-get install ffmpeg - macOS:
brew install ffmpeg
If ffmpeg is not on PATH, set:
# Windows (PowerShell)
$env:VOCALTRACE_FFMPEG="C:\path\to\ffmpeg.exe"
# Linux/macOS
export VOCALTRACE_FFMPEG=/usr/bin/ffmpeg
⚠️ Important (Windows): Do NOT install ffmpeg via Conda in this environment. Conda ffmpeg causes DLL conflicts with PySide6/Qt.
VocalTrace supports ClearVoice for AI-based speech enhancement. Because it has strict version requirements, it is not included in the default requirements.txt.
To enable ClearVoice:
- Run
pip install clearvoice. - Crucial: After installing clearvoice, you must re-run the main requirements to fix the version conflicts it creates:
pip install -r requirements.txt
Create an .env file in the root directory:
HF_TOKEN_TRANSCRIBE: For Pyannote models.GEMINI_TRANSCRIBE_ANALYSIS_API_KEY: For Gemini 2.5 analysis.OPENAI_API_KEY: For GPT-4o/5.1 analysis.
VocalTrace automatically supports FlashAttention2 when available to reduce GPU memory usage and improve inference speed for Whisper-based models.
FlashAttention2 is optional. If it is not installed or not supported on your system, VocalTrace automatically falls back to standard (“eager”) attention with no loss of correctness.
Internally, VocalTrace attempts to load models with:
attn_implementation="flash_attention_2"and transparently falls back to:
attn_implementation="eager"if FlashAttention2 is unavailable.
- Requires a compatible NVIDIA GPU
- Supported for fp16 / bf16 models
- Works best with recent PyTorch + CUDA builds
- Not required for CPU inference or smaller models
On many Linux systems, FlashAttention can be installed directly via pip:
pip install flash-attn --no-build-isolationIf this fails, it usually means your CUDA / PyTorch toolchain is not compatible with building the extension locally.
Building FlashAttention from source on Windows is often difficult. The recommended approach is to install a prebuilt wheel that matches:
- Python version (e.g.
cp312) - PyTorch version (e.g.
torch2.8) - CUDA version (e.g.
cu128) - Architecture (
win_amd64)
Example source of Windows prebuilt wheels:
Example filename:
flash_attn-2.8.2+cu128torch2.8-cp312-cp312-win_amd64.whl
Install with:
pip install flash_attn-2.8.2+cu128torch2.8-cp312-cp312-win_amd64.whlVerify:
python -c "import flash_attn; print('flash-attn OK')"- If FlashAttention fails to load, VocalTrace will log a warning and continue using eager attention.
- No configuration changes are required to disable FlashAttention manually.
- If you encounter crashes during model loading, uninstall
flash-attnand retry — VocalTrace will still function normally.
python launch.pyVocalTrace bypasses the use of torchcodec so it is safe to ignore the warning: "UserWarning: torchcodec is not installed correctly so built-in audio decoding will fail."
If you previously installed ffmpeg via Conda in the same environment and PySide6 fails to import, remove the conda ffmpeg package:
conda remove ffmpegThen use a system ffmpeg executable (see above).
If you see an error like “ffmpeg executable not found”, install ffmpeg or set VOCALTRACE_FFMPEG.
Set log level with:
# Windows (PowerShell)
$env:VOCALTRACE_LOGLEVEL="DEBUG"
# Linux/macOS
export VOCALTRACE_LOGLEVEL=DEBUGVocalTrace follows a linear forensic process to ensure the highest data integrity:
Load your source file and use the "Clean Audio Now" feature. This uses the AudioDenoiser engine to create a high-quality "Working Copy" (_cleaned.wav) while preserving your original evidence. Once cleaned, the entire app (including the Snipper) automatically switches to this improved source.
The engine detects "who spoke when." Use the Voice Bank to match detected clusters against known biometric signatures. v0.3 supports in-memory processing to avoid disk-thrashing.
Run the Whisper-based transcription on your cleaned audio.
- Refinement: Right-click any segment in the transcript to open the Snipper.
- Commit: Adjust boundaries, correct text, and click Commit. Committed segments turn Green and are stored as "Verified Ground Truth."
Use the "Chat with Evidence" tab to query your transcript. The system uses Retrieval-Augmented Generation to answer questions based only on the provided transcript, complete with a dynamic persona (e.g., "Forensic Accountant") generated during initial analysis.
launch.py– entrypoint (addssrc/to path and starts the Qt app)src/main.py– main Qt window + tabssrc/transcription_engine.py– diarization + transcription orchestrationsrc/AudioDenoiser.py– audio cleaning pipeline (authoritative audio processing)src/ui/*– GUI tabs and helpersvoices/– voice samples / bank
VocalTrace is licensed under the MIT License. (See LICENSE.txt for details).
Note: We recommend the MIT license for software over CC-BY to ensure compatibility with open-source repositories.





