Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/),
and this project adheres to [Semantic Versioning](https://semver.org/).

## [0.5.1] - 2026-04-12

### Fixed

- Per-track artist display in Lyrion/LMS: the album-artist MusicBrainz ID was being written as the per-track `MUSICBRAINZ_ARTISTID`, causing LMS to dedupe all tracks to a single contributor row and show the first track's artist for every row. The MBID now goes to `MUSICBRAINZ_ALBUMARTISTID` (Picard-canonical), and the per-track key is never written. Jellyfin display is unchanged by this fix (it dedupes by name, not MBID).
- Per-track artists whose case-insensitive form equals the album artist are now normalized to the album artist's casing (e.g. "AFROJACK - ID" with album artist "Afrojack" → `ARTIST=Afrojack`). Prevents duplicate contributor rows in Lyrion and stray upper/lowercase variants in Jellyfin. Applied as defense-in-depth so tier-1 sources and un-cached artists still get clean output.
- Album-artist MBID is now suppressed for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would merge the collab album into that member's solo discography.

## [0.5.0] - 2026-04-12

First release with a proper project presence: a hero README, a published docs site, an animated landing page, CI, and a rounded-out CLI UX.
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ It pairs naturally with [CrateDigger](https://github.com/Rouzax/CrateDigger), wh

- **Chapter-accurate splitting.** Sample-accurate cuts at chapter boundaries, gapless playback across tracks.
- **Codec-aware output.** FLAC for lossless sources, Opus stream-copy when safe, transparent re-encode when not. Pick with `--format`.
- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments.
- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ALBUMARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments.
- **Album and artist artwork.** Generates 1:1 cover art (embedded in every track and written to `cover.jpg` / `folder.jpg`) and an artist folder image.
- **Two metadata tiers.** Basic tagging for any chaptered video. Full enrichment when CrateDigger-style tags are present.
- **Re-run detection.** A manifest in each album folder tracks chapter hashes and source mtime, so repeat runs on the same library are near-instant.
Expand Down
10 changes: 9 additions & 1 deletion docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,10 +38,18 @@ The album folder name depends on the metadata tier (see below). With full CrateD

Vorbis comments written on every track:

`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`.
`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ALBUMARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`.

Most servers only read the common fields (TITLE/ARTIST/ALBUM/TRACKNUMBER/DATE). The custom `FESTIVAL`, `STAGE`, `VENUE` fields preserve festival context for scripts, filters, or smart playlists that care.

### Artist tagging policy

- `ARTIST` is per-track (the performer of that chapter's track). When a chapter title has no "Artist - Title" separator, `ARTIST` falls back to `ALBUMARTIST`.
- `ALBUMARTIST` is always the set's headliner (the album-level artist).
- Per-track artists whose case-insensitive form equals `ALBUMARTIST` are normalized to the `ALBUMARTIST` casing, so "AFROJACK - ID" becomes `ARTIST=Afrojack` when the set is by "Afrojack". This prevents Lyrion from listing two contributor rows and prevents Jellyfin from picking up stray upper/lowercase variants.
- `MUSICBRAINZ_ALBUMARTISTID` holds the album artist's MusicBrainz ID. It is omitted for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would cause media servers to merge the collab album into that member's solo discography.
- `MUSICBRAINZ_ARTISTID` (the per-track MBID key) is never written. TrackSplit has no per-track-artist MBIDs; writing the album-artist MBID there caused Lyrion to collapse every track to a single contributor row.

## Metadata tiers

- **Tier 1 (basic):** any chaptered video. TrackSplit infers artist and album from the filename and embedded tags, numbers tracks, and writes whatever metadata it can find.
Expand Down
4 changes: 2 additions & 2 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
[project]
name = "tracksplit"
version = "0.5.0"
version = "0.5.1"
description = "Extract audio from video chapters into FLAC music albums"
readme = "README.md"
license = { text = "GPL-3.0-only" }
requires-python = ">=3.11"
dependencies = [
"Pillow>=10.0",
"Pillow>=12.2.0",
"typer>=0.9",
"mutagen>=1.47",
"rich>=13.0",
Expand Down
53 changes: 36 additions & 17 deletions src/tracksplit/metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -119,23 +119,8 @@ def build_album_meta(
Tier 2: album = "Festival Year (Stage)" with full tag data.
Tier 1: album = filename_stem, artist/date parsed from filename.
"""
# Strip labels, then split artist from title
clean_titles = []
track_artists = []
publishers = []
for ch in chapters:
title, label = strip_label(ch.title)
track_artist, track_title = split_track_artist(title)
clean_titles.append(track_title)
track_artists.append(track_artist)
publishers.append(label)

# Deduplicate titles
clean_titles = deduplicate_titles(clean_titles)

# Get genres from tags
genres = tags.get("genres", [])

# Resolve album-level fields up front so we can case-normalize per-track
# artists against the album artist below (see loop).
if tier == 2:
artist = tags.get("artist", "")
festival = tags.get("festival", "")
Expand All @@ -156,6 +141,40 @@ def build_album_meta(
date = year
album = filename_stem

# Strip labels, then split artist from title.
#
# Defense-in-depth: if a chapter's per-track artist matches the album
# artist case-insensitively (e.g. chapter "AFROJACK - ID" with album
# ARTIST "Afrojack"), normalize to the album artist's canonical casing.
# Without this, Lyrion treats "AFROJACK" and "Afrojack" as two separate
# contributors, and Jellyfin collapses them but keeps the first-scanned
# casing as the display name. CrateDigger ideally normalizes upstream,
# but 40% of DJs are missing from its cache and tier-1 (non-CrateDigger)
# sources make no such guarantee, so we apply the cheap local fix here.
# Whole-string match only: "AFROJACK & Steve Aoki" stays as-is because
# that's a genuinely different contributor string.
clean_titles = []
track_artists = []
publishers = []
for ch in chapters:
title, label = strip_label(ch.title)
track_artist, track_title = split_track_artist(title)
if (
track_artist
and artist
and track_artist.casefold() == artist.casefold()
):
track_artist = artist
clean_titles.append(track_title)
track_artists.append(track_artist)
publishers.append(label)

# Deduplicate titles
clean_titles = deduplicate_titles(clean_titles)

# Get genres from tags
genres = tags.get("genres", [])

# Build tracks
tracks = []
for i, ch in enumerate(chapters):
Expand Down
31 changes: 29 additions & 2 deletions src/tracksplit/tagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@

import base64
import logging
import re
from collections.abc import Callable, Sequence
from pathlib import Path

Expand All @@ -11,10 +12,36 @@

from tracksplit.models import AlbumMeta, TrackMeta

# Matches collab separators in an album-artist string. A single MBID cannot
# identify two performers, so we suppress MUSICBRAINZ_ALBUMARTISTID when any
# of these appear as whitespace-delimited tokens: "X & Y", "X | Y", "X vs Y"
# (with or without trailing dot), "X x Y". Whitespace-delimited so names like
# "Axwell", "deadmau5", or "Eric Prydz" do not false-positive.
_COLLAB_SEPARATOR_RE = re.compile(r"\s(?:&|\||vs\.?|x)\s", re.IGNORECASE)


def _is_collab_artist(artist: str) -> bool:
return bool(_COLLAB_SEPARATOR_RE.search(artist))


def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]:
"""Build a Vorbis comment dict from album and track metadata.

Tag policy (single source of truth for both FLAC and OggOpus):

- ``TITLE`` / ``ARTIST``: per-track. ``ARTIST`` falls back to ``album.artist``
when the chapter title had no "Artist - Title" separator.
- ``ALBUMARTIST``: always the album-level artist (the set headliner).
- ``MUSICBRAINZ_ALBUMARTISTID``: the album artist's MusicBrainz ID, under the
Picard-canonical key. Suppressed for B2B/collab album artists (those
containing "&", "|", "vs.", or " x ") because a single MBID cannot
identify two performers; writing it anyway would merge the collab album
into one member's solo discography in LMS/Jellyfin.
- ``MUSICBRAINZ_ARTISTID`` (per-track MBID) is **never** emitted: TrackSplit
has no per-track-artist MBIDs. Writing the album-artist MBID here (the
pre-fix behavior) caused Lyrion to dedupe every track to a single
contributor row.

All values are lists of strings per the Vorbis comment specification.
Optional tags are omitted when their source value is empty.
"""
Expand Down Expand Up @@ -45,8 +72,8 @@ def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]:
if album.comment:
tags["COMMENT"] = [album.comment]

if album.musicbrainz_artistid:
tags["MUSICBRAINZ_ARTISTID"] = [album.musicbrainz_artistid]
if album.musicbrainz_artistid and not _is_collab_artist(album.artist):
tags["MUSICBRAINZ_ALBUMARTISTID"] = [album.musicbrainz_artistid]

if album.festival:
tags["FESTIVAL"] = [album.festival]
Expand Down
58 changes: 57 additions & 1 deletion tests/test_metadata.py
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,62 @@ def test_probe_to_metadata_to_tagger_contract():
assert tag_dict["STAGE"] == ["Mainstage"]
assert tag_dict["VENUE"] == ["Boom"]
assert tag_dict["COMMENT"] == ["https://1001tl.com/abc"]
assert tag_dict["MUSICBRAINZ_ARTISTID"] == ["uuid-456"]
assert tag_dict["MUSICBRAINZ_ALBUMARTISTID"] == ["uuid-456"]
assert tag_dict["PUBLISHER"] == ["Spinnin"]
assert tag_dict["TITLE"] == ["Animals"]


# --- Per-track artist case normalization (defense-in-depth) ---

def test_track_artist_case_normalized_uppercase_chapter():
"""Chapter 'AFROJACK - ID' with album ARTIST 'Afrojack' → normalized."""
tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"}
chapters = _make_chapters(["AFROJACK - ID", "AFROJACK - Bringin It Back"])
meta = build_album_meta(tags, chapters, "", tier=2)
assert meta.tracks[0].artist == "Afrojack"
assert meta.tracks[1].artist == "Afrojack"


def test_track_artist_case_normalized_lowercase_chapter():
"""Chapter 'deadmau5 - Strobe' with album ARTIST 'Deadmau5' → normalized."""
tags = {"artist": "Deadmau5", "festival": "Tomorrowland Brasil", "date": "2025"}
chapters = _make_chapters(["deadmau5 - Strobe"])
meta = build_album_meta(tags, chapters, "", tier=2)
assert meta.tracks[0].artist == "Deadmau5"


def test_track_artist_preserved_when_not_whole_match():
"""Multi-artist strings containing the album artist stay as-is."""
tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"}
chapters = _make_chapters([
"AFROJACK & Steve Aoki ft. Miss Palmer - No Beef",
"AFROJACK & Martin Garrix - Turn Up The Speakers",
])
meta = build_album_meta(tags, chapters, "", tier=2)
assert meta.tracks[0].artist == "AFROJACK & Steve Aoki ft. Miss Palmer"
assert meta.tracks[1].artist == "AFROJACK & Martin Garrix"


def test_track_artist_empty_when_chapter_has_no_separator():
"""Chapter 'Intro' (no ' - ') → track.artist stays empty, falls back later."""
tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025"}
chapters = _make_chapters(["Intro", "ID"])
meta = build_album_meta(tags, chapters, "", tier=2)
assert meta.tracks[0].artist == ""
assert meta.tracks[1].artist == ""


def test_unicode_artist_preserved_through_build_album_meta():
"""Diacritics in artist names must survive the pipeline untouched."""
tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025-05-17"}
chapters = _make_chapters([
"RÜFÜS DU SOL - Innerbloom",
"Kölsch - Grey",
"Amél - Birds Of A Feather",
])
meta = build_album_meta(tags, chapters, "", tier=2)
assert meta.artist == "Tiësto"
assert meta.tracks[0].artist == "RÜFÜS DU SOL"
assert meta.tracks[1].artist == "Kölsch"
assert meta.tracks[2].artist == "Amél"
assert meta.tracks[2].title == "Birds Of A Feather"
108 changes: 106 additions & 2 deletions tests/test_tagger.py
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ def test_build_tag_dict_all_fields():
assert tags["GENRE"] == ["Trance"]
assert tags["PUBLISHER"] == ["Armada Music"]
assert tags["COMMENT"] == ["Full set recording"]
assert tags["MUSICBRAINZ_ARTISTID"] == ["test-mbid-1234"]
assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"]
assert tags["FESTIVAL"] == ["Ultra Music Festival"]
assert tags["STAGE"] == ["Mainstage"]
assert tags["VENUE"] == ["Bayfront Park"]
Expand Down Expand Up @@ -85,7 +85,7 @@ def test_build_tag_dict_minimal():
# Optional tags absent
for key in (
"TRACKTOTAL", "DATE", "GENRE", "PUBLISHER", "COMMENT",
"MUSICBRAINZ_ARTISTID", "FESTIVAL", "STAGE", "VENUE",
"MUSICBRAINZ_ALBUMARTISTID", "FESTIVAL", "STAGE", "VENUE",
):
assert key not in tags, f"{key} should not be present when empty"

Expand Down Expand Up @@ -152,3 +152,107 @@ def test_tag_all_dispatches_by_extension():

mock_flac.assert_called_once()
mock_ogg.assert_called_once()


# --- MBID policy: no per-track MBID, collab guard ---

def test_no_musicbrainz_artistid_emitted():
"""The per-track MBID key must never be written (regression guard).

Writing album-artist MBID as per-track MBID caused Lyrion to collapse
every track to a single contributor row. We never have real per-track
MBIDs, so the key stays out of the dict entirely.
"""
album = _full_album()
tags = build_tag_dict(album, album.tracks[0])
assert "MUSICBRAINZ_ARTISTID" not in tags


def test_albumartist_mbid_written_for_solo_artist():
album = _full_album() # artist="Armin van Buuren", MBID="test-mbid-1234"
tags = build_tag_dict(album, album.tracks[0])
assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"]


def test_albumartist_mbid_suppressed_for_ampersand_collab():
"""'X & Y' album artists have no single-person MBID; don't write one."""
album = AlbumMeta(
artist="Armin van Buuren & KI/KI",
album="AMF 2025 (Two Is One)",
musicbrainz_artistid="477b8c0c-c5fc-4ad2-b5b2-191f0bf2a9df",
tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
)
tags = build_tag_dict(album, album.tracks[0])
assert "MUSICBRAINZ_ALBUMARTISTID" not in tags


def test_albumartist_mbid_suppressed_for_vs_collab():
album = AlbumMeta(
artist="Armin van Buuren vs. Hardwell",
album="Collab Set",
musicbrainz_artistid="some-mbid",
tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
)
tags = build_tag_dict(album, album.tracks[0])
assert "MUSICBRAINZ_ALBUMARTISTID" not in tags


def test_albumartist_mbid_suppressed_for_x_collab():
album = AlbumMeta(
artist="Martin Garrix x Alesso",
album="Collab Set",
musicbrainz_artistid="some-mbid",
tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
)
tags = build_tag_dict(album, album.tracks[0])
assert "MUSICBRAINZ_ALBUMARTISTID" not in tags


def test_collab_guard_does_not_false_positive_on_embedded_letters():
"""Names like 'Axwell', 'deadmau5', 'Eric Prydz' must not trip the guard."""
for name in ("Axwell", "deadmau5", "Eric Prydz", "Tiësto", "R3HAB"):
album = AlbumMeta(
artist=name,
album="Set",
musicbrainz_artistid="mbid-abc",
tracks=[TrackMeta(number=1, title="T", start=0.0, end=60.0)],
)
tags = build_tag_dict(album, album.tracks[0])
assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-abc"], f"false positive on {name!r}"


def test_opus_round_trip_preserves_unicode_tags(tmp_path):
"""Write an opus, tag with unicode, read it back, expect exact strings."""
import shutil
import subprocess

ffmpeg = shutil.which("ffmpeg")
if ffmpeg is None:
import pytest
pytest.skip("ffmpeg not available")

opus_path = tmp_path / "test.opus"
subprocess.run(
[ffmpeg, "-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo",
"-t", "0.5", "-c:a", "libopus", "-b:a", "32k",
str(opus_path), "-y", "-loglevel", "error"],
check=True,
)

album = AlbumMeta(
artist="Tiësto",
album="EDC",
musicbrainz_artistid="mbid-ti",
tracks=[TrackMeta(number=1, title="Strobe", start=0.0, end=30.0,
artist="RÜFÜS DU SOL")],
)
from tracksplit.tagger import tag_ogg
from mutagen.oggopus import OggOpus

tag_ogg(opus_path, album, album.tracks[0])
reread = OggOpus(str(opus_path))

assert reread["ARTIST"] == ["RÜFÜS DU SOL"]
assert reread["ALBUMARTIST"] == ["Tiësto"]
assert reread["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-ti"]
assert "MUSICBRAINZ_ARTISTID" not in reread
Loading