-
Notifications
You must be signed in to change notification settings - Fork 68
PTDT-3807: Add temporal audio annotation support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
PTDT-3807: Add temporal audio annotation support #2013
Conversation
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
f6cbd31
to
68773cf
Compare
e585aab
to
58b30f7
Compare
c761dcf
to
9675c73
Compare
5bf0913
to
327800b
Compare
2e8f828
to
1174ad8
Compare
8e06a7a
to
2361ca3
Compare
3e51273
to
59f0cd8
Compare
frame_dict[annotation.frame].append(annotation) | ||
return frame_dict | ||
elif isinstance(annotation, AudioClassificationAnnotation): | ||
frame_dict[annotation.start_frame].append(annotation) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Audio Annotations Indexed Inconsistently
The frame_annotations
method indexes AudioClassificationAnnotation
instances using only their start_frame
. As audio annotations represent a time range, this prevents querying for annotations active at intermediate frames within their duration and is inconsistent with how single-frame video annotations are indexed.
List of TemporalNDJSON objects | ||
""" | ||
def audio_frame_extractor(ann: AudioClassificationAnnotation) -> Tuple[int, int]: | ||
return (ann.start_frame, ann.end_frame or ann.start_frame) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Audio Frame Extraction Fails with Zero-Length Frames
The audio_frame_extractor
function's end_frame
logic (ann.end_frame or ann.start_frame
) incorrectly treats 0
as None
, causing it to fall back to start_frame
. This can create zero-length frames (start == end
), which may lead to incorrect containment checks in TemporalFrame
and misrepresent nested annotation relationships in the HierarchyBuilder
.
Here's the updated PR description that reflects the refactoring work we've completed:
Description
This PR introduces Audio Temporal Annotations - a new feature that enables precise time-based annotations for audio files in the Labelbox SDK. This includes support for temporal classification annotations with millisecond-level timing precision.
Motivation: Audio annotation workflows require precise timing control for applications like:
Context: This feature extends the existing audio annotation infrastructure to support temporal annotations, using a millisecond-based timing system that provides the precision needed for audio applications while maintaining compatibility with the existing NDJSON serialization format.
Type of change
All Submissions
New Feature Submissions
Changes to Core Features
Summary of Changes
New Audio Temporal Annotation Types
AudioClassificationAnnotation
: Time-based classifications (radio, checklist, text) for audio segmentsCore Infrastructure Updates
TemporalFrame
,AnnotationGroupManager
,ValueGrouper
, andHierarchyBuilder
componentstemporal.py
module with generic components that can be reused for video, audio, and other temporal annotation typesCode Architecture Improvements
Generic[TemporalAnnotation]
for compile-time type checkingframe_extractor
callable allows different annotation types to use the same processing logicoverlaps()
method and improved temporal containment logiccreate_audio_ndjson_annotations()
convenience functionTesting
test_v3_serialization.py
(attached at the bottom) that validates both structure and valuesDocumentation & Examples
audio.ipynb
with temporal annotation examplesdemo_audio_token_temporal.py
showing per-token temporal annotationsSerialization & Import Support
Key Features
Precise Timing Control
Per-Token Temporal Annotations
Ontology Setup for Temporal Annotations
Label Integration
Technical Architecture
Generic Temporal Components
The refactored architecture provides reusable components for any temporal annotation type:
This feature enables the Labelbox SDK to support precise temporal audio annotation workflows while providing a robust, reusable architecture for future temporal annotation types. The modular design ensures maintainability and extensibility while preserving full backward compatibility.
Click to expand: Python Script