Skip to content

PDF2Muse is a command-line tool that converts PDF files of sheet music into MusicXML 🎼 and MuseScore (.mscx) files. It leverages the power of the oemer optical music recognition library to transcribe the music from the PDF.

Notifications You must be signed in to change notification settings

thedivergentai/PDF2Muse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

34 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDF2Muse 🎢

PDF2Muse is a modern Python tool that converts PDF files of sheet music into MusicXML 🎼 and MuseScore (.mscx) files using optical music recognition (OMR). It leverages the power of the oemer library to transcribe music from PDFs.

✨ Features

  • Easy to use: Simple command-line interface and web UI
  • High quality: Uses state-of-the-art optical music recognition
  • Flexible output: Generates both MusicXML and MuseScore formats
  • Modern architecture: Built with modern Python best practices
  • Beautiful output: Rich terminal output with progress indicators

πŸ™ Acknowledgements

This project would not have been possible without the excellent work done by the oemer project. We extend our sincere gratitude to the oemer team for creating such a powerful and versatile optical music recognition library.

βš™οΈ Requirements

  • Python 3.9 or higher 🐍
  • Poppler (for PDF to image conversion)

⬇️ Installation

Quick Install

pip install -e .

Development Install

pip install -e ".[dev]"

The package will automatically install all required dependencies including:

  • oemer (optical music recognition)
  • pdf2image (PDF conversion)
  • typer (CLI framework)
  • gradio (web interface)
  • rich (beautiful terminal output)

Installing Poppler

Poppler is required for PDF to image conversion:

Windows:

# Using Chocolatey
choco install poppler

# Or download from: https://github.com/oschwartz10612/poppler-windows/releases/

macOS:

brew install poppler

Linux (Ubuntu/Debian):

sudo apt-get install poppler-utils

πŸš€ Usage

Command Line Interface

Convert a PDF file:

pdf2muse convert sheet_music.pdf

Specify output directory:

pdf2muse convert sheet_music.pdf --output ./my_output

Disable deskewing:

pdf2muse convert sheet_music.pdf --no-deskew

Use TensorFlow instead of ONNX:

pdf2muse convert sheet_music.pdf --use-tf

Enable verbose logging:

pdf2muse convert sheet_music.pdf --verbose

Show help:

pdf2muse --help
pdf2muse convert --help

Web Interface

Launch the Gradio web UI:

pdf2muse ui

With custom port:

pdf2muse ui --port 8080

Create a public shareable link:

pdf2muse ui --share

Download Models

Pre-download the oemer model checkpoints:

pdf2muse download-models

Force re-download:

pdf2muse download-models --force

πŸ“¦ Package Structure

PDF2Muse/
β”œβ”€β”€ src/
β”‚   └── pdf2muse/
β”‚       β”œβ”€β”€ __init__.py       # Package initialization
β”‚       β”œβ”€β”€ cli.py            # Typer CLI entry point
β”‚       β”œβ”€β”€ core.py           # Main processing pipeline
β”‚       β”œβ”€β”€ oemer_utils.py    # Oemer wrapper utilities
β”‚       β”œβ”€β”€ musicxml.py       # MusicXML manipulation
β”‚       └── ui.py             # Gradio web interface
β”œβ”€β”€ pyproject.toml            # Project metadata & dependencies
β”œβ”€β”€ README.md                 # This file
└── .gitignore

πŸ”§ Development

Running Tests

pytest

Code Formatting

black src/

Linting

ruff check src/

πŸ“ How It Works

  1. PDF to Images: Converts each page of the PDF to a high-resolution PNG image
  2. OMR Processing: Runs oemer on each image to extract musical notation
  3. MusicXML Generation: Combines the recognized music into MusicXML format
  4. MuseScore Conversion: Converts the MusicXML to MuseScore's .mscx format

🎯 Best Results

For optimal recognition quality:

  • Use high-resolution scans (300 DPI or higher)
  • Ensure clear, uncluttered sheet music
  • Use standard Western music notation
  • Avoid handwritten scores (printed music works best)

πŸ› Troubleshooting

Import Error: No module named 'pdf2muse'

  • Make sure you installed the package: pip install -e .

Command not found: pdf2muse

  • Ensure your Python scripts directory is in your PATH
  • Try running: python -m pdf2muse.cli instead

Poppler error during conversion

  • Install Poppler (see Installation section above)
  • On Windows, add Poppler's bin directory to your PATH

No MusicXML files generated

  • Check that your PDF contains clear sheet music
  • Try enabling verbose mode: pdf2muse convert file.pdf --verbose
  • Ensure oemer checkpoints are downloaded: pdf2muse download-models

πŸ“œ License

MIT License - see LICENSE file for details.

πŸ”— Links

About

PDF2Muse is a command-line tool that converts PDF files of sheet music into MusicXML 🎼 and MuseScore (.mscx) files. It leverages the power of the oemer optical music recognition library to transcribe the music from the PDF.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages