-
Notifications
You must be signed in to change notification settings - Fork 68
PTDT-3807: Add temporal audio annotation support #2013
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
e4fd630
dbcc7bf
dbb592f
ff298d4
16896fd
7a666cc
ac58ad0
67dd14a
a1600e5
b4d2f42
fadb14e
1e12596
c2a7b4c
26a35fd
b16f2ea
943cb73
a838513
0ca9cd6
7861537
6c3c50a
68773cf
58b30f7
0a63def
538ba66
9675c73
327800b
1174ad8
2361ca3
59f0cd8
b186359
e63b306
a74c6c4
0683dfd
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -27,6 +27,30 @@ | |
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
"<td>\n", | ||
" <a target=\"_blank\" href=\"https://labelbox.com\" ><img src=\"https://labelbox.com/blog/content/images/2021/02/logo-v4.svg\" width=256/></a>\n", | ||
"</td>\n" | ||
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
"<td>\n", | ||
"<a href=\"https://colab.research.google.com/github/Labelbox/labelbox-python/blob/develop/examples/annotation_import/audio.ipynb\" target=\"_blank\"><img\n", | ||
"src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"></a>\n", | ||
"</td>\n", | ||
"\n", | ||
"<td>\n", | ||
"<a href=\"https://github.com/Labelbox/labelbox-python/tree/develop/examples/annotation_import/audio.ipynb\" target=\"_blank\"><img\n", | ||
"src=\"https://img.shields.io/badge/GitHub-100000?logo=github&logoColor=white\" alt=\"GitHub\"></a>\n", | ||
"</td>" | ||
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
|
@@ -170,7 +194,7 @@ | |
}, | ||
{ | ||
"metadata": {}, | ||
"source": "ontology_builder = lb.OntologyBuilder(classifications=[\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"text_audio\"),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_audio\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_audio\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n])\n\nontology = client.create_ontology(\n \"Ontology Audio Annotations\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Audio,\n)", | ||
"source": "ontology_builder = lb.OntologyBuilder(classifications=[\n lb.Classification(class_type=lb.Classification.Type.TEXT,\n name=\"text_audio\"),\n lb.Classification(\n class_type=lb.Classification.Type.CHECKLIST,\n name=\"checklist_audio\",\n options=[\n lb.Option(value=\"first_checklist_answer\"),\n lb.Option(value=\"second_checklist_answer\"),\n ],\n ),\n lb.Classification(\n class_type=lb.Classification.Type.RADIO,\n name=\"radio_audio\",\n options=[\n lb.Option(value=\"first_radio_answer\"),\n lb.Option(value=\"second_radio_answer\"),\n ],\n ),\n # Temporal classification for token-level annotations\n lb.Classification(\n class_type=lb.Classification.Type.TEXT,\n name=\"User Speaker\",\n scope=lb.Classification.Scope.INDEX, # INDEX scope for temporal\n ),\n])\n\nontology = client.create_ontology(\n \"Ontology Audio Annotations\",\n ontology_builder.asdict(),\n media_type=lb.MediaType.Audio,\n)", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
|
@@ -223,6 +247,27 @@ | |
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
"\n" | ||
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
}, | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
{ | ||
"metadata": {}, | ||
"source": "label = []\nlabel.append(\n lb_types.Label(\n data={\"global_key\": global_key},\n annotations=[text_annotation, checklist_annotation, radio_annotation],\n ))", | ||
|
@@ -252,6 +297,29 @@ | |
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
"## Temporal Audio Annotations\n", | ||
"\n", | ||
"You can create temporal annotations for individual tokens (words) with precise timing:\n" | ||
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "# Define tokens with precise timing (from demo script)\ntokens_data = [\n (\"Hello\", 586, 770), # Hello: frames 586-770\n (\"AI\", 771, 955), # AI: frames 771-955\n (\"how\", 956, 1140), # how: frames 956-1140\n (\"are\", 1141, 1325), # are: frames 1141-1325\n (\"you\", 1326, 1510), # you: frames 1326-1510\n (\"doing\", 1511, 1695), # doing: frames 1511-1695\n (\"today\", 1696, 1880), # today: frames 1696-1880\n]\n\n# Create temporal annotations for each token\ntemporal_annotations = []\nfor token, start_frame, end_frame in tokens_data:\n token_annotation = lb_types.AudioClassificationAnnotation(\n frame=start_frame,\n end_frame=end_frame,\n name=\"User Speaker\",\n value=lb_types.Text(answer=token),\n )\n temporal_annotations.append(token_annotation)\n\nprint(f\"Created {len(temporal_annotations)} temporal token annotations\")", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "# Create label with both regular and temporal annotations\nlabel_with_temporal = []\nlabel_with_temporal.append(\n lb_types.Label(\n data={\"global_key\": global_key},\n annotations=[text_annotation, checklist_annotation, radio_annotation] +\n temporal_annotations,\n ))\n\nprint(\n f\"Created label with {len(label_with_temporal[0].annotations)} total annotations\"\n)\nprint(f\" - Regular annotations: 3\")\nprint(f\" - Temporal annotations: {len(temporal_annotations)}\")", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": [ | ||
|
@@ -260,6 +328,13 @@ | |
], | ||
"cell_type": "markdown" | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "# Upload temporal annotations via MAL\ntemporal_upload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=f\"temporal_mal_job-{str(uuid.uuid4())}\",\n predictions=label_with_temporal,\n)\n\ntemporal_upload_job.wait_until_done()\nprint(\"Temporal upload completed!\")\nprint(\"Errors:\", temporal_upload_job.errors)\nprint(\"Status:\", temporal_upload_job.statuses)", | ||
"cell_type": "code", | ||
"outputs": [], | ||
"execution_count": null | ||
}, | ||
{ | ||
"metadata": {}, | ||
"source": "# Upload our label using Model-Assisted Labeling\nupload_job = lb.MALPredictionImport.create_from_objects(\n client=client,\n project_id=project.uid,\n name=f\"mal_job-{str(uuid.uuid4())}\",\n predictions=label,\n)\n\nupload_job.wait_until_done()\nprint(\"Errors:\", upload_job.errors)\nprint(\"Status of uploads: \", upload_job.statuses)", | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
from typing import Optional | ||
from pydantic import Field, AliasChoices | ||
|
||
from labelbox.data.annotation_types.annotation import ( | ||
ClassificationAnnotation, | ||
) | ||
|
||
|
||
class AudioClassificationAnnotation(ClassificationAnnotation): | ||
rishisurana-labelbox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
"""Audio classification for specific time range | ||
Examples: | ||
- Speaker identification from 2500ms to 4100ms | ||
- Audio quality assessment for a segment | ||
- Language detection for audio segments | ||
Args: | ||
name (Optional[str]): Name of the classification | ||
feature_schema_id (Optional[Cuid]): Feature schema identifier | ||
value (Union[Text, Checklist, Radio]): Classification value | ||
start_frame (int): The frame index in milliseconds (e.g., 2500 = 2.5 seconds) | ||
end_frame (Optional[int]): End frame in milliseconds (for time ranges) | ||
segment_index (Optional[int]): Index of audio segment this annotation belongs to | ||
extra (Dict[str, Any]): Additional metadata | ||
""" | ||
|
||
start_frame: int = Field( | ||
validation_alias=AliasChoices("start_frame", "frame"), | ||
serialization_alias="start_frame", | ||
) | ||
rishisurana-labelbox marked this conversation as resolved.
Show resolved
Hide resolved
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Audio Annotation API InconsistencyThe |
||
end_frame: Optional[int] = Field( | ||
default=None, | ||
validation_alias=AliasChoices("end_frame", "endFrame"), | ||
serialization_alias="end_frame", | ||
) | ||
segment_index: Optional[int] = None | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -13,6 +13,7 @@ | |
from .metrics import ScalarMetric, ConfusionMatrixMetric | ||
from .video import VideoClassificationAnnotation | ||
from .video import VideoObjectAnnotation, VideoMaskAnnotation | ||
from .audio import AudioClassificationAnnotation | ||
from .mmc import MessageEvaluationTaskAnnotation | ||
from pydantic import BaseModel, field_validator | ||
|
||
|
@@ -44,6 +45,7 @@ class Label(BaseModel): | |
ClassificationAnnotation, | ||
ObjectAnnotation, | ||
VideoMaskAnnotation, | ||
AudioClassificationAnnotation, | ||
ScalarMetric, | ||
ConfusionMatrixMetric, | ||
RelationshipAnnotation, | ||
|
@@ -75,15 +77,23 @@ def _get_annotations_by_type(self, annotation_type): | |
|
||
def frame_annotations( | ||
self, | ||
) -> Dict[str, Union[VideoObjectAnnotation, VideoClassificationAnnotation]]: | ||
) -> Dict[int, Union[VideoObjectAnnotation, VideoClassificationAnnotation, AudioClassificationAnnotation]]: | ||
"""Get temporal annotations organized by frame | ||
|
||
Returns: | ||
Dict[int, List]: Dictionary mapping frame (milliseconds) to list of temporal annotations | ||
|
||
Example: | ||
>>> label.frame_annotations() | ||
{2500: [VideoClassificationAnnotation(...), AudioClassificationAnnotation(...)]} | ||
""" | ||
frame_dict = defaultdict(list) | ||
for annotation in self.annotations: | ||
if isinstance( | ||
annotation, | ||
(VideoObjectAnnotation, VideoClassificationAnnotation), | ||
): | ||
if isinstance(annotation, (VideoObjectAnnotation, VideoClassificationAnnotation)): | ||
frame_dict[annotation.frame].append(annotation) | ||
return frame_dict | ||
elif isinstance(annotation, AudioClassificationAnnotation): | ||
frame_dict[annotation.start_frame].append(annotation) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Bug: Audio Annotations Indexed InconsistentlyThe |
||
return dict(frame_dict) | ||
rishisurana-labelbox marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
def add_url_to_masks(self, signer) -> "Label": | ||
""" | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Duplicate Header Cells in Audio Notebook
The
audio.ipynb
notebook now includes duplicate header cells at the start. The commit adds new markdown cells (lines 30-53 in the diff) that are identical to the existing Labelbox logo and badge links, resulting in redundant content.