Releases · FoxNoseTech/diarize

diarize v0.1.0

Speaker diarization for Python — answers "who spoke when?" in any audio file. CPU-only, no GPU, no API keys, no account signup.

Highlights

~10.8% DER on VoxConverse dev set — lower than pyannote's free models (community-1 and 3.1 legacy, both ~11.2%)
~8x faster than real-time on CPU (RTF 0.12 vs pyannote community-1's 0.86)
Automatic speaker count detection via GMM BIC with silhouette refinement (1–7 speakers)
Zero setup friction — pip install diarize and you're done, no HuggingFace token or account needed

Pipeline

Silero VAD → WeSpeaker ResNet34-LM (ONNX) → GMM BIC → Spectral Clustering

All four stages run on CPU. All components are open-source with permissive licenses.

Usage

from diarize import diarize

result = diarize("meeting.wav")
for seg in result.segments:
    print(f"  [{seg.start:.1f}s - {seg.end:.1f}s] {seg.speaker}")

Known Limitations

Benchmarked on a single dataset (VoxConverse). Cross-dataset validation is planned.
Speaker count estimation degrades for 8+ speakers — pass num_speakers explicitly when known.
Overlapping speech is not modeled — each segment is assigned to one speaker.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

Fixed

Docs

Uh oh!

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

diarize v0.1.0

Highlights

Pipeline

Usage

Known Limitations

Uh oh!

Releases: FoxNoseTech/diarize

v0.1.1

Fixed

Docs

Uh oh!

v0.1.0 — Initial Release

diarize v0.1.0

Highlights

Pipeline

Usage

Known Limitations

Uh oh!