Releases: FoxNoseTech/diarize
Releases · FoxNoseTech/diarize
v0.1.1
This patch release fixes dependency compatibility for audio loading.
Fixed
- Pinned
torchandtorchaudioto a compatible range:torch>=1.13,<2.9torchaudio>=0.13,<2.9
- Prevents failures where newer
torchaudiorequirestorchcodec.
Docs
- Clarified that diarize now installs a compatible torch/torchaudio range automatically.
No API changes.
v0.1.0 — Initial Release
diarize v0.1.0
Speaker diarization for Python — answers "who spoke when?" in any audio file. CPU-only, no GPU, no API keys, no account signup.
Highlights
- ~10.8% DER on VoxConverse dev set — lower than pyannote's free models (community-1 and 3.1 legacy, both ~11.2%)
- ~8x faster than real-time on CPU (RTF 0.12 vs pyannote community-1's 0.86)
- Automatic speaker count detection via GMM BIC with silhouette refinement (1–7 speakers)
- Zero setup friction —
pip install diarizeand you're done, no HuggingFace token or account needed
Pipeline
Silero VAD → WeSpeaker ResNet34-LM (ONNX) → GMM BIC → Spectral Clustering
All four stages run on CPU. All components are open-source with permissive licenses.
Usage
from diarize import diarize
result = diarize("meeting.wav")
for seg in result.segments:
print(f" [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")Known Limitations
- Benchmarked on a single dataset (VoxConverse). Cross-dataset validation is planned.
- Speaker count estimation degrades for 8+ speakers — pass num_speakers explicitly when known.
- Overlapping speech is not modeled — each segment is assigned to one speaker.