Skip to content

MALerLab/ls-yolo

Repository files navigation

LS-YOLO (Latent-Score-YOLO)

YOLOv8 for music score segmentation.
This series of models predicts bounding boxes for musical systems and staves.

TL;DR

  • All dependencies are managed by pipenv. Please check Pipfile and Pipfile.lock.
  • To segment score images and resize the segmented images, run the scripts in the following order:
    1. ls-yolo-system_inference.py
    2. ls-yolo-staff-height_inference.py
    3. resize_imgs.py
    4. ls-yolo-staff-bbox_inference.py (optional)
  • Before running the scripts, verify the dataset_dir variable in each script file.
  • Before running resize_imgs.py, also verify TGT_HEIGHT and mtdt_file_path.
  • Before running the scripts, double-check every detail!

About Versions

LS-YOLO-System

The LS-YOLO-System model predicts bounding boxes of musical systems within an input image.
The input image should contain one or more musical systems.

⚠️ Caveat 1
The model is trained with few negative samples that contain no musical systems (e.g., a scorebook cover, an image of a pianist playing piano, a picture of a person, etc.).
However, its capability to filter out images without musical systems is not guaranteed.

⚠️ Caveat 2
The data splits for v1.0.0 and v2.0.0 are missing.
Note that v3.0.0 used a different data split for training, validation, and testing compared to v1.0.0 and v2.0.0.
This means you cannot reproduce v1 and v2 with the datasets on the release page.

v1

v1.0.0 was trained on ls-yolo-system.

v2

v2.0.0 was trained on ls-yolo-system, ls-system-HIL_001, and ls-system-HIL_002.
v2.0.1 is v2.0.0 fine-tuned with ls-yolo-system-ossq-250605.

v3

v3.0.0 was trained on ls-yolo-system, ls-system-HIL_001, ls-system-HIL_002, ls-yolo-system-ossq-250605, and ls-yolo-system-HIL_003.

LS-YOLO-Staff-Height

Formerly LS-YOLO-Staff.

The LS-YOLO-Staff-Height model predicts the height values of staves within an input image by predicting bounding boxes around the five lines in each staff.
The input image should contain a single musical system.

⚠️ Caveat
The data splits for v1.0.0 and v2.0.0 are missing.
Note that v2.1.0 used a different data split for training, validation, and testing compared to v1.0.0 and v2.0.0.
This means you cannot reproduce v1 and v2 with the datasets on the release page.

v1

v1.0.0 was trained on ls-yolo-staff-height.

v2

v2.0.0 was trained on ls-yolo-staff-height, ls-yolo-staff-height-HIL_001, and ls-yolo-staff-height-HIL_002.

LS-YOLO-Staff-Bbox

The LS-YOLO-Staff-Bbox model predicts bounding boxes for each part within the given image.
The input image should contain a single musical system.

v1

v1.0.0 was trained on ls-yolo-staff-bbox.
v1.0.1 is v1.0.0 fine-tuned with ls-yolo-staff-bbox-HIL_001.
v1.1.0 was trained on ls-yolo-staff-bbox and ls-yolo-staff-bbox-HIL_001.

About

YOLOv8 for music score segmentation

Resources

Stars

Watchers

Forks

Languages