feat: expose speaker embeddings and subsegments in DiarizeResult#4
feat: expose speaker embeddings and subsegments in DiarizeResult#4smm-h wants to merge 1 commit intoFoxNoseTech:mainfrom
Conversation
|
Hey, @smm-h The feature itself is useful, especially for advanced/debugging workflows. I’m supportive of exposing subsegments / embeddings, but not as raw ndarray fields on the default public result object. That couples internal pipeline artifacts to the main API and introduces serialization/equality regressions. I’d suggest making this opt-in (for example Thank you. |
|
Thanks for the feedback @loookashow — good points on all counts. I've revised the PR to address your concerns:
Let me know if this is more in line with what you had in mind, or if you'd prefer a different API shape. |
|
Thanks @smm-h, this API shape is much better. I checked the updated branch locally:
A few small things before merge:
After those are addressed, this looks good to me. |
Summary
Expose speaker embeddings and subsegments from the diarization pipeline via an opt-in
return_artifactsparameter, stored in a separate serializableDiarizeArtifactsmodel.Changes
utils.py: AddedDiarizeArtifactsmodel withembeddings: list[list[float]]andsubsegments: list[SubSegment]— plain Python types, fully serializable, no numpy on the public API. Addedartifacts: DiarizeArtifacts | None = Nonefield onDiarizeResult.__init__.py: Addedreturn_artifacts: bool = Falsekeyword argument todiarize(). WhenTrue, convertsembeddingsto nested lists via.tolist()and populatesresult.artifacts. WhenFalse(default),artifactsisNone.Motivation
Use case: cross-recording speaker clustering and identification. When processing multiple audio files, having access to the raw speaker embeddings allows users to cluster or match speakers across recordings — something that is not possible with just the segment labels.
Design decisions (addressing review feedback)
return_artifacts=True— zero cost at default.DiarizeArtifactskeeps internal pipeline data off the mainDiarizeResultsurface.list[list[float]](not numpy arrays), somodel_dump(), JSON serialization, and equality all work cleanly. Noarbitrary_types_allowedneeded.main.