pyannote

💚 Simply detect, segment, label, and separate speakers in any language

🎤 What is speaker diarization?

Speaker diarization is the process of automatically partitioning the audio recording of a conversation into segments and labeling them by speaker, answering the question "who spoke when?". As the foundational layer of conversational AI, speaker diarization provides high-level insights for human-human and human-machine conversations, and unlocks a wide range of downstream applications: meeting transcription, call center analytics, voice agents, video dubbing.

▶️ Getting started

Install pyannote.audio latest release available from with either uv (recommended) or pip:

$ uv add pyannote.audio
$ pip install pyannote.audio

Enjoy state-of-the-art speaker diarization:

# download pretrained pipeline from Huggingface
from pyannote.audio import Pipeline
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-community-1', token="HUGGINGFACE_TOKEN")

# perform speaker diarization locally
output = pipeline('/path/to/audio.wav')

# enjoy state-of-the-art speaker diarization
for turn, speaker in output.speaker_diarization:
    print(f"{speaker} speaks between t={turn.start}s and t={turn.end}s")

Read community-1 model card to make the most of it.

🏆 State-of-the-art models

pyannoteAI research team trains cutting-edge speaker diarization models, thanks to Jean Zay 🇫🇷 supercomputer managed by GENCI 💚. They come in two flavors:

pyannote.audio open models available on Huggingface and used by 140k+ developers over the world ;
premium models available on pyannoteAI cloud (and on-premise for enterprise customers) that provide state-of-the-art speaker diarization as well as additional enterprise features.

Benchmark (last updated in 2025-09)	`legacy` (3.1)	`community-1`	`precision-2`
AISHELL-4	12.2	11.7	11.4 🏆
AliMeeting (channel 1)	24.5	20.3	15.2 🏆
AMI (IHM)	18.8	17.0	12.9 🏆
AMI (SDM)	22.7	19.9	15.6 🏆
AVA-AVD	49.7	44.6	37.1 🏆
CALLHOME (part 2)	28.5	26.7	16.6 🏆
DIHARD 3 (full)	21.4	20.2	14.7 🏆
Ego4D (dev.)	51.2	46.8	39.0 🏆
MSDWild	25.4	22.8	17.3 🏆
RAMC	22.2	20.8	10.5 🏆
REPERE (phase2)	7.9	8.9	7.4 🏆
VoxConverse (v0.3)	11.2	11.2	8.5 🏆

Diarization error rate (in %, the lower, the better)

⏩️ Going further, better, and faster

precision-2 premium model further improves accuracy, processing speed, as well as brings additional features.

Features	`community-1`	`precision-2`
Set exact/min/max number of speakers	✅	✅
Exclusive speaker diarization (for transcription)	✅	✅
Segmentation confidence scores	❌	✅
Speaker confidence scores	❌	✅
Voiceprinting	❌	✅
Speaker identification	❌	✅
Time to process 1h of audio (on H100)	37s	14s

Create a pyannoteAI account, change one line of code, and enjoy free cloud credits to try precision-2 premium diarization:

# perform premium speaker diarization on pyannoteAI cloud
pipeline = Pipeline.from_pretrained('pyannote/speaker-diarization-precision-2', token="PYANNOTEAI_API_KEY")
better_output = pipeline('/path/to/audio.wav')

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyannote

💚 Simply detect, segment, label, and separate speakers in any language

🎤 What is speaker diarization?

▶️ Getting started

🏆 State-of-the-art models

⏩️ Going further, better, and faster

Pinned Loading

Repositories

People

Top languages

Most used topics

Uh oh!