-
Notifications
You must be signed in to change notification settings - Fork 1
Description
The current ForcedAlignment/evaluate.py script is incompatible with the newly proposed gold format (see clamsproject/aapb-annotations#121). It uses brittle text-matching logic that is now obsolete by the new process.py in the linked PR.
I propose refactoring evaluate.py to use character offsets for alignment instead of text. The updated script will no longer perform any text normalization or string comparison. Instead, it will leverage the alignment-start and alignment-end (or possibly a different set of column names when the PR is merged) character offsets from the gold .tsv files as the ground truth.
The core change will be a complete rewrite of the _read_pred method. To create the sparse hypothesis segments, the script will:
- Read the character range (
alignment-start,alignment-end) for a segment from the gold.tsvfile. - Scan the
Tokenannotations in the prediction MMIF file. - Identify the first and last
Tokenannotations whose own character offsets fall within the gold segment's character range. - Use the timestamps associated with these first and last tokens to define the boundaries of the new hypothesis segment.
This approach is significantly more robust and elegant. It completely removes the dependency on the proprietary gold transcript during evaluation and eliminates all fragile text-matching code.
One more thing to consider during the re-implementing is how we use the pyannote.mectirc library.
Current implementation uses "downsampling" of dense annotation (token-level) in the MMIF to match sparse annotation (every 10 tokens) on the gold data, then uses SegmentationCoverage and SegmentationPurity for the metrics, which seemed correct when I implemented But now I'm requesting @shel-ho for confirmation of this usage or suggestions for alternative metrics from their recent research on the subject matter.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status