YOLOv8 for music score segmentation.
This series of models predicts bounding boxes for musical systems and staves.
- All dependencies are managed by pipenv. Please check
PipfileandPipfile.lock. - To segment score images and resize the segmented images, run the scripts in the following order:
ls-yolo-system_inference.pyls-yolo-staff-height_inference.pyresize_imgs.pyls-yolo-staff-bbox_inference.py(optional)
- Before running the scripts, verify the
dataset_dirvariable in each script file. - Before running
resize_imgs.py, also verifyTGT_HEIGHTandmtdt_file_path. - Before running the scripts, double-check every detail!
The LS-YOLO-System model predicts bounding boxes of musical systems within an input image.
The input image should contain one or more musical systems.
⚠️ Caveat 1
The model is trained with few negative samples that contain no musical systems (e.g., a scorebook cover, an image of a pianist playing piano, a picture of a person, etc.).
However, its capability to filter out images without musical systems is not guaranteed.
⚠️ Caveat 2
The data splits for v1.0.0 and v2.0.0 are missing.
Note that v3.0.0 used a different data split for training, validation, and testing compared to v1.0.0 and v2.0.0.
This means you cannot reproduce v1 and v2 with the datasets on the release page.
v1.0.0 was trained on ls-yolo-system.
v2.0.0 was trained on ls-yolo-system, ls-system-HIL_001, and ls-system-HIL_002.
v2.0.1 is v2.0.0 fine-tuned with ls-yolo-system-ossq-250605.
v3.0.0 was trained on ls-yolo-system, ls-system-HIL_001, ls-system-HIL_002, ls-yolo-system-ossq-250605, and ls-yolo-system-HIL_003.
Formerly
LS-YOLO-Staff.
The LS-YOLO-Staff-Height model predicts the height values of staves within an input image by predicting bounding boxes around the five lines in each staff.
The input image should contain a single musical system.
⚠️ Caveat
The data splits for v1.0.0 and v2.0.0 are missing.
Note that v2.1.0 used a different data split for training, validation, and testing compared to v1.0.0 and v2.0.0.
This means you cannot reproduce v1 and v2 with the datasets on the release page.
v1.0.0 was trained on ls-yolo-staff-height.
v2.0.0 was trained on ls-yolo-staff-height, ls-yolo-staff-height-HIL_001, and ls-yolo-staff-height-HIL_002.
The LS-YOLO-Staff-Bbox model predicts bounding boxes for each part within the given image.
The input image should contain a single musical system.
v1.0.0 was trained on ls-yolo-staff-bbox.
v1.0.1 is v1.0.0 fine-tuned with ls-yolo-staff-bbox-HIL_001.
v1.1.0 was trained on ls-yolo-staff-bbox and ls-yolo-staff-bbox-HIL_001.