From f27186c0d595eb933b88946df145b5091e280b78 Mon Sep 17 00:00:00 2001 From: Rouzax Date: Sun, 12 Apr 2026 16:55:38 +0000 Subject: [PATCH 1/2] fix(tags): correct MBID tagging and case-normalize per-track artists Album-artist MBID was being written as per-track MUSICBRAINZ_ARTISTID, which Lyrion uses to dedupe contributors, collapsing every track in a set to one row. Move to Picard-canonical MUSICBRAINZ_ALBUMARTISTID and suppress entirely for B2B/collab album artists (a single MBID cannot identify two performers). Case-normalize per-track artists against the album artist as defense-in-depth for sources not normalized upstream. Added parameterized invariant tests over 79 real MKV sidecars. --- CHANGELOG.md | 8 ++ README.md | 2 +- docs/output.md | 10 ++- pyproject.toml | 2 +- src/tracksplit/metadata.py | 53 ++++++++----- src/tracksplit/tagger.py | 31 +++++++- tests/test_metadata.py | 58 ++++++++++++++- tests/test_tagger.py | 108 ++++++++++++++++++++++++++- tests/test_tagger_fixtures.py | 135 ++++++++++++++++++++++++++++++++++ 9 files changed, 382 insertions(+), 25 deletions(-) create mode 100644 tests/test_tagger_fixtures.py diff --git a/CHANGELOG.md b/CHANGELOG.md index 10740c4..5fd584d 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file. The format is based on [Keep a Changelog](https://keepachangelog.com/), and this project adheres to [Semantic Versioning](https://semver.org/). +## [0.5.1] - 2026-04-12 + +### Fixed + +- Per-track artist display in Lyrion/LMS: the album-artist MusicBrainz ID was being written as the per-track `MUSICBRAINZ_ARTISTID`, causing LMS to dedupe all tracks to a single contributor row and show the first track's artist for every row. The MBID now goes to `MUSICBRAINZ_ALBUMARTISTID` (Picard-canonical), and the per-track key is never written. Jellyfin display is unchanged by this fix (it dedupes by name, not MBID). +- Per-track artists whose case-insensitive form equals the album artist are now normalized to the album artist's casing (e.g. "AFROJACK - ID" with album artist "Afrojack" → `ARTIST=Afrojack`). Prevents duplicate contributor rows in Lyrion and stray upper/lowercase variants in Jellyfin. Applied as defense-in-depth so tier-1 sources and un-cached artists still get clean output. +- Album-artist MBID is now suppressed for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would merge the collab album into that member's solo discography. + ## [0.5.0] - 2026-04-12 First release with a proper project presence: a hero README, a published docs site, an animated landing page, CI, and a rounded-out CLI UX. diff --git a/README.md b/README.md index 6bf2e73..c37f666 100644 --- a/README.md +++ b/README.md @@ -44,7 +44,7 @@ It pairs naturally with [CrateDigger](https://github.com/Rouzax/CrateDigger), wh - **Chapter-accurate splitting.** Sample-accurate cuts at chapter boundaries, gapless playback across tracks. - **Codec-aware output.** FLAC for lossless sources, Opus stream-copy when safe, transparent re-encode when not. Pick with `--format`. -- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments. +- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ALBUMARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments. - **Album and artist artwork.** Generates 1:1 cover art (embedded in every track and written to `cover.jpg` / `folder.jpg`) and an artist folder image. - **Two metadata tiers.** Basic tagging for any chaptered video. Full enrichment when CrateDigger-style tags are present. - **Re-run detection.** A manifest in each album folder tracks chapter hashes and source mtime, so repeat runs on the same library are near-instant. diff --git a/docs/output.md b/docs/output.md index e93ef31..f710076 100644 --- a/docs/output.md +++ b/docs/output.md @@ -38,10 +38,18 @@ The album folder name depends on the metadata tier (see below). With full CrateD Vorbis comments written on every track: -`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`. +`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ALBUMARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`. Most servers only read the common fields (TITLE/ARTIST/ALBUM/TRACKNUMBER/DATE). The custom `FESTIVAL`, `STAGE`, `VENUE` fields preserve festival context for scripts, filters, or smart playlists that care. +### Artist tagging policy + +- `ARTIST` is per-track (the performer of that chapter's track). When a chapter title has no "Artist - Title" separator, `ARTIST` falls back to `ALBUMARTIST`. +- `ALBUMARTIST` is always the set's headliner (the album-level artist). +- Per-track artists whose case-insensitive form equals `ALBUMARTIST` are normalized to the `ALBUMARTIST` casing, so "AFROJACK - ID" becomes `ARTIST=Afrojack` when the set is by "Afrojack". This prevents Lyrion from listing two contributor rows and prevents Jellyfin from picking up stray upper/lowercase variants. +- `MUSICBRAINZ_ALBUMARTISTID` holds the album artist's MusicBrainz ID. It is omitted for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would cause media servers to merge the collab album into that member's solo discography. +- `MUSICBRAINZ_ARTISTID` (the per-track MBID key) is never written. TrackSplit has no per-track-artist MBIDs; writing the album-artist MBID there caused Lyrion to collapse every track to a single contributor row. + ## Metadata tiers - **Tier 1 (basic):** any chaptered video. TrackSplit infers artist and album from the filename and embedded tags, numbers tracks, and writes whatever metadata it can find. diff --git a/pyproject.toml b/pyproject.toml index 2745d4c..ccb1f3b 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "tracksplit" -version = "0.5.0" +version = "0.5.1" description = "Extract audio from video chapters into FLAC music albums" readme = "README.md" license = { text = "GPL-3.0-only" } diff --git a/src/tracksplit/metadata.py b/src/tracksplit/metadata.py index aca0157..0d0fb6b 100644 --- a/src/tracksplit/metadata.py +++ b/src/tracksplit/metadata.py @@ -119,23 +119,8 @@ def build_album_meta( Tier 2: album = "Festival Year (Stage)" with full tag data. Tier 1: album = filename_stem, artist/date parsed from filename. """ - # Strip labels, then split artist from title - clean_titles = [] - track_artists = [] - publishers = [] - for ch in chapters: - title, label = strip_label(ch.title) - track_artist, track_title = split_track_artist(title) - clean_titles.append(track_title) - track_artists.append(track_artist) - publishers.append(label) - - # Deduplicate titles - clean_titles = deduplicate_titles(clean_titles) - - # Get genres from tags - genres = tags.get("genres", []) - + # Resolve album-level fields up front so we can case-normalize per-track + # artists against the album artist below (see loop). if tier == 2: artist = tags.get("artist", "") festival = tags.get("festival", "") @@ -156,6 +141,40 @@ def build_album_meta( date = year album = filename_stem + # Strip labels, then split artist from title. + # + # Defense-in-depth: if a chapter's per-track artist matches the album + # artist case-insensitively (e.g. chapter "AFROJACK - ID" with album + # ARTIST "Afrojack"), normalize to the album artist's canonical casing. + # Without this, Lyrion treats "AFROJACK" and "Afrojack" as two separate + # contributors, and Jellyfin collapses them but keeps the first-scanned + # casing as the display name. CrateDigger ideally normalizes upstream, + # but 40% of DJs are missing from its cache and tier-1 (non-CrateDigger) + # sources make no such guarantee, so we apply the cheap local fix here. + # Whole-string match only: "AFROJACK & Steve Aoki" stays as-is because + # that's a genuinely different contributor string. + clean_titles = [] + track_artists = [] + publishers = [] + for ch in chapters: + title, label = strip_label(ch.title) + track_artist, track_title = split_track_artist(title) + if ( + track_artist + and artist + and track_artist.casefold() == artist.casefold() + ): + track_artist = artist + clean_titles.append(track_title) + track_artists.append(track_artist) + publishers.append(label) + + # Deduplicate titles + clean_titles = deduplicate_titles(clean_titles) + + # Get genres from tags + genres = tags.get("genres", []) + # Build tracks tracks = [] for i, ch in enumerate(chapters): diff --git a/src/tracksplit/tagger.py b/src/tracksplit/tagger.py index fe21d14..70ea7c5 100644 --- a/src/tracksplit/tagger.py +++ b/src/tracksplit/tagger.py @@ -3,6 +3,7 @@ import base64 import logging +import re from collections.abc import Callable, Sequence from pathlib import Path @@ -11,10 +12,36 @@ from tracksplit.models import AlbumMeta, TrackMeta +# Matches collab separators in an album-artist string. A single MBID cannot +# identify two performers, so we suppress MUSICBRAINZ_ALBUMARTISTID when any +# of these appear as whitespace-delimited tokens: "X & Y", "X | Y", "X vs Y" +# (with or without trailing dot), "X x Y". Whitespace-delimited so names like +# "Axwell", "deadmau5", or "Eric Prydz" do not false-positive. +_COLLAB_SEPARATOR_RE = re.compile(r"\s(?:&|\||vs\.?|x)\s", re.IGNORECASE) + + +def _is_collab_artist(artist: str) -> bool: + return bool(_COLLAB_SEPARATOR_RE.search(artist)) + def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]: """Build a Vorbis comment dict from album and track metadata. + Tag policy (single source of truth for both FLAC and OggOpus): + + - ``TITLE`` / ``ARTIST``: per-track. ``ARTIST`` falls back to ``album.artist`` + when the chapter title had no "Artist - Title" separator. + - ``ALBUMARTIST``: always the album-level artist (the set headliner). + - ``MUSICBRAINZ_ALBUMARTISTID``: the album artist's MusicBrainz ID, under the + Picard-canonical key. Suppressed for B2B/collab album artists (those + containing "&", "|", "vs.", or " x ") because a single MBID cannot + identify two performers; writing it anyway would merge the collab album + into one member's solo discography in LMS/Jellyfin. + - ``MUSICBRAINZ_ARTISTID`` (per-track MBID) is **never** emitted: TrackSplit + has no per-track-artist MBIDs. Writing the album-artist MBID here (the + pre-fix behavior) caused Lyrion to dedupe every track to a single + contributor row. + All values are lists of strings per the Vorbis comment specification. Optional tags are omitted when their source value is empty. """ @@ -45,8 +72,8 @@ def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]: if album.comment: tags["COMMENT"] = [album.comment] - if album.musicbrainz_artistid: - tags["MUSICBRAINZ_ARTISTID"] = [album.musicbrainz_artistid] + if album.musicbrainz_artistid and not _is_collab_artist(album.artist): + tags["MUSICBRAINZ_ALBUMARTISTID"] = [album.musicbrainz_artistid] if album.festival: tags["FESTIVAL"] = [album.festival] diff --git a/tests/test_metadata.py b/tests/test_metadata.py index 1a6fa62..dcbce80 100644 --- a/tests/test_metadata.py +++ b/tests/test_metadata.py @@ -333,6 +333,62 @@ def test_probe_to_metadata_to_tagger_contract(): assert tag_dict["STAGE"] == ["Mainstage"] assert tag_dict["VENUE"] == ["Boom"] assert tag_dict["COMMENT"] == ["https://1001tl.com/abc"] - assert tag_dict["MUSICBRAINZ_ARTISTID"] == ["uuid-456"] + assert tag_dict["MUSICBRAINZ_ALBUMARTISTID"] == ["uuid-456"] assert tag_dict["PUBLISHER"] == ["Spinnin"] assert tag_dict["TITLE"] == ["Animals"] + + +# --- Per-track artist case normalization (defense-in-depth) --- + +def test_track_artist_case_normalized_uppercase_chapter(): + """Chapter 'AFROJACK - ID' with album ARTIST 'Afrojack' → normalized.""" + tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"} + chapters = _make_chapters(["AFROJACK - ID", "AFROJACK - Bringin It Back"]) + meta = build_album_meta(tags, chapters, "", tier=2) + assert meta.tracks[0].artist == "Afrojack" + assert meta.tracks[1].artist == "Afrojack" + + +def test_track_artist_case_normalized_lowercase_chapter(): + """Chapter 'deadmau5 - Strobe' with album ARTIST 'Deadmau5' → normalized.""" + tags = {"artist": "Deadmau5", "festival": "Tomorrowland Brasil", "date": "2025"} + chapters = _make_chapters(["deadmau5 - Strobe"]) + meta = build_album_meta(tags, chapters, "", tier=2) + assert meta.tracks[0].artist == "Deadmau5" + + +def test_track_artist_preserved_when_not_whole_match(): + """Multi-artist strings containing the album artist stay as-is.""" + tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"} + chapters = _make_chapters([ + "AFROJACK & Steve Aoki ft. Miss Palmer - No Beef", + "AFROJACK & Martin Garrix - Turn Up The Speakers", + ]) + meta = build_album_meta(tags, chapters, "", tier=2) + assert meta.tracks[0].artist == "AFROJACK & Steve Aoki ft. Miss Palmer" + assert meta.tracks[1].artist == "AFROJACK & Martin Garrix" + + +def test_track_artist_empty_when_chapter_has_no_separator(): + """Chapter 'Intro' (no ' - ') → track.artist stays empty, falls back later.""" + tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025"} + chapters = _make_chapters(["Intro", "ID"]) + meta = build_album_meta(tags, chapters, "", tier=2) + assert meta.tracks[0].artist == "" + assert meta.tracks[1].artist == "" + + +def test_unicode_artist_preserved_through_build_album_meta(): + """Diacritics in artist names must survive the pipeline untouched.""" + tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025-05-17"} + chapters = _make_chapters([ + "RÜFÜS DU SOL - Innerbloom", + "Kölsch - Grey", + "Amél - Birds Of A Feather", + ]) + meta = build_album_meta(tags, chapters, "", tier=2) + assert meta.artist == "Tiësto" + assert meta.tracks[0].artist == "RÜFÜS DU SOL" + assert meta.tracks[1].artist == "Kölsch" + assert meta.tracks[2].artist == "Amél" + assert meta.tracks[2].title == "Birds Of A Feather" diff --git a/tests/test_tagger.py b/tests/test_tagger.py index 7a17712..b010156 100644 --- a/tests/test_tagger.py +++ b/tests/test_tagger.py @@ -53,7 +53,7 @@ def test_build_tag_dict_all_fields(): assert tags["GENRE"] == ["Trance"] assert tags["PUBLISHER"] == ["Armada Music"] assert tags["COMMENT"] == ["Full set recording"] - assert tags["MUSICBRAINZ_ARTISTID"] == ["test-mbid-1234"] + assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"] assert tags["FESTIVAL"] == ["Ultra Music Festival"] assert tags["STAGE"] == ["Mainstage"] assert tags["VENUE"] == ["Bayfront Park"] @@ -85,7 +85,7 @@ def test_build_tag_dict_minimal(): # Optional tags absent for key in ( "TRACKTOTAL", "DATE", "GENRE", "PUBLISHER", "COMMENT", - "MUSICBRAINZ_ARTISTID", "FESTIVAL", "STAGE", "VENUE", + "MUSICBRAINZ_ALBUMARTISTID", "FESTIVAL", "STAGE", "VENUE", ): assert key not in tags, f"{key} should not be present when empty" @@ -152,3 +152,107 @@ def test_tag_all_dispatches_by_extension(): mock_flac.assert_called_once() mock_ogg.assert_called_once() + + +# --- MBID policy: no per-track MBID, collab guard --- + +def test_no_musicbrainz_artistid_emitted(): + """The per-track MBID key must never be written (regression guard). + + Writing album-artist MBID as per-track MBID caused Lyrion to collapse + every track to a single contributor row. We never have real per-track + MBIDs, so the key stays out of the dict entirely. + """ + album = _full_album() + tags = build_tag_dict(album, album.tracks[0]) + assert "MUSICBRAINZ_ARTISTID" not in tags + + +def test_albumartist_mbid_written_for_solo_artist(): + album = _full_album() # artist="Armin van Buuren", MBID="test-mbid-1234" + tags = build_tag_dict(album, album.tracks[0]) + assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"] + + +def test_albumartist_mbid_suppressed_for_ampersand_collab(): + """'X & Y' album artists have no single-person MBID; don't write one.""" + album = AlbumMeta( + artist="Armin van Buuren & KI/KI", + album="AMF 2025 (Two Is One)", + musicbrainz_artistid="477b8c0c-c5fc-4ad2-b5b2-191f0bf2a9df", + tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)], + ) + tags = build_tag_dict(album, album.tracks[0]) + assert "MUSICBRAINZ_ALBUMARTISTID" not in tags + + +def test_albumartist_mbid_suppressed_for_vs_collab(): + album = AlbumMeta( + artist="Armin van Buuren vs. Hardwell", + album="Collab Set", + musicbrainz_artistid="some-mbid", + tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)], + ) + tags = build_tag_dict(album, album.tracks[0]) + assert "MUSICBRAINZ_ALBUMARTISTID" not in tags + + +def test_albumartist_mbid_suppressed_for_x_collab(): + album = AlbumMeta( + artist="Martin Garrix x Alesso", + album="Collab Set", + musicbrainz_artistid="some-mbid", + tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)], + ) + tags = build_tag_dict(album, album.tracks[0]) + assert "MUSICBRAINZ_ALBUMARTISTID" not in tags + + +def test_collab_guard_does_not_false_positive_on_embedded_letters(): + """Names like 'Axwell', 'deadmau5', 'Eric Prydz' must not trip the guard.""" + for name in ("Axwell", "deadmau5", "Eric Prydz", "Tiësto", "R3HAB"): + album = AlbumMeta( + artist=name, + album="Set", + musicbrainz_artistid="mbid-abc", + tracks=[TrackMeta(number=1, title="T", start=0.0, end=60.0)], + ) + tags = build_tag_dict(album, album.tracks[0]) + assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-abc"], f"false positive on {name!r}" + + +def test_opus_round_trip_preserves_unicode_tags(tmp_path): + """Write an opus, tag with unicode, read it back, expect exact strings.""" + import shutil + import subprocess + + ffmpeg = shutil.which("ffmpeg") + if ffmpeg is None: + import pytest + pytest.skip("ffmpeg not available") + + opus_path = tmp_path / "test.opus" + subprocess.run( + [ffmpeg, "-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo", + "-t", "0.5", "-c:a", "libopus", "-b:a", "32k", + str(opus_path), "-y", "-loglevel", "error"], + check=True, + ) + + album = AlbumMeta( + artist="Tiësto", + album="EDC", + musicbrainz_artistid="mbid-ti", + tracks=[TrackMeta(number=1, title="Strobe", start=0.0, end=30.0, + artist="RÜFÜS DU SOL")], + ) + from tracksplit.tagger import tag_ogg + from mutagen.oggopus import OggOpus + + tag_ogg(opus_path, album, album.tracks[0]) + reread = OggOpus(str(opus_path)) + + assert reread["ARTIST"] == ["RÜFÜS DU SOL"] + assert reread["ALBUMARTIST"] == ["Tiësto"] + assert reread["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-ti"] + assert "MUSICBRAINZ_ARTISTID" not in reread diff --git a/tests/test_tagger_fixtures.py b/tests/test_tagger_fixtures.py new file mode 100644 index 0000000..bd0f444 --- /dev/null +++ b/tests/test_tagger_fixtures.py @@ -0,0 +1,135 @@ +"""Parameterized invariant tests over the real CrateDigger MKV corpus. + +Runs only when the local fixture directory is present. Every JSON sidecar +in the dump exercises the full path: + + ffprobe tags -> parse_tags -> build_album_meta -> build_tag_dict + +and asserts invariants that must hold for every set. Gives us continuous +regression coverage against real data without shipping the corpus. +""" +from __future__ import annotations + +import json +import re +from pathlib import Path + +import pytest + +from tracksplit.metadata import build_album_meta +from tracksplit.models import Chapter +from tracksplit.tagger import build_tag_dict + +DUMP_DIR = Path("/home/martijn/_temp/cratedigger/data/mkv-info-dump") + +pytestmark = pytest.mark.skipif( + not DUMP_DIR.is_dir(), + reason=f"MKV dump corpus not present at {DUMP_DIR}", +) + +# Same regex as tagger._COLLAB_SEPARATOR_RE, duplicated here so the test +# breaks loudly if the two drift. +_COLLAB_RE = re.compile(r"\s(?:&|\||vs\.?|x)\s", re.IGNORECASE) + + +def _fixture_ids() -> list[str]: + if not DUMP_DIR.is_dir(): + return [] + return sorted(p.name for p in DUMP_DIR.glob("*.json")) + + +def _tags_from_extra(extra: dict) -> dict: + """Translate the MKV `extra` block into the dict shape parse_tags produces.""" + genres_raw = extra.get("CRATEDIGGER_1001TL_GENRES", "") + return { + "artist": extra.get("ARTIST", ""), + "festival": extra.get("CRATEDIGGER_1001TL_FESTIVAL", ""), + "date": extra.get("CRATEDIGGER_1001TL_DATE", ""), + "stage": extra.get("CRATEDIGGER_1001TL_STAGE", ""), + "venue": extra.get("CRATEDIGGER_1001TL_VENUE", ""), + "genres": [g for g in genres_raw.split("|") if g], + "comment": extra.get("CRATEDIGGER_1001TL_URL", ""), + "musicbrainz_artistid": extra.get("CRATEDIGGER_MBID", ""), + } + + +def _chapters_from_menu(menu_track: dict) -> list[Chapter]: + """Extract chapters from the MediaInfo Menu track structure.""" + chapters: list[Chapter] = [] + extra = menu_track.get("extra", {}) + # Keys are timecodes like "_00_00_00000"; values are strings like "en:TITLE". + entries = sorted( + (k, v) for k, v in extra.items() + if isinstance(k, str) and k.startswith("_") + ) + times: list[float] = [] + titles: list[str] = [] + for k, v in entries: + parts = k.lstrip("_").split("_") + if len(parts) < 3: + continue + h, m, rest = parts[0], parts[1], parts[2] + try: + seconds = int(h) * 3600 + int(m) * 60 + int(rest[:2]) + int(rest[2:].ljust(3, "0")[:3]) / 1000 + except ValueError: + continue + if isinstance(v, str) and ":" in v: + v = v.split(":", 1)[1] + times.append(float(seconds)) + titles.append(v) + + if not times: + return [] + for i, (start, title) in enumerate(zip(times, titles)): + end = times[i + 1] if i + 1 < len(times) else start + 1.0 + chapters.append(Chapter(index=i + 1, title=title, start=start, end=end)) + return chapters + + +def _load_fixture(path: Path) -> tuple[dict, list[Chapter]] | None: + raw = json.loads(path.read_text(encoding="utf-8")) + tracks = raw.get("mediainfo", {}).get("media", {}).get("track", []) + general = next((t for t in tracks if t.get("@type") == "General"), None) + menu = next((t for t in tracks if t.get("@type") == "Menu"), None) + if not general or not menu: + return None + tags = _tags_from_extra(general.get("extra", {})) + chapters = _chapters_from_menu(menu) + if not chapters: + return None + return tags, chapters + + +@pytest.mark.parametrize("fixture_name", _fixture_ids()) +def test_corpus_invariants(fixture_name): + loaded = _load_fixture(DUMP_DIR / fixture_name) + if loaded is None: + pytest.skip(f"{fixture_name}: no usable General+Menu tracks") + tags, chapters = loaded + stem = fixture_name.replace(".json", "") + meta = build_album_meta(tags, chapters, stem, tier=2) + + assert meta.album, f"{fixture_name}: empty ALBUM" + assert meta.artist, f"{fixture_name}: empty ALBUMARTIST" + + for track in meta.tracks: + td = build_tag_dict(meta, track) + + # Required tags populated + assert td["TITLE"] and td["TITLE"][0], f"{fixture_name} t{track.number}: empty TITLE" + assert td["ARTIST"] and td["ARTIST"][0], f"{fixture_name} t{track.number}: empty ARTIST" + assert td["ALBUMARTIST"] and td["ALBUMARTIST"][0], f"{fixture_name} t{track.number}: empty ALBUMARTIST" + + # Regression guard: the old tag key must never reappear + assert "MUSICBRAINZ_ARTISTID" not in td, ( + f"{fixture_name} t{track.number}: MUSICBRAINZ_ARTISTID leaked back" + ) + + # Collab guard: when album-artist MBID is written, ALBUMARTIST must + # unambiguously identify a single performer. + if "MUSICBRAINZ_ALBUMARTISTID" in td: + aa = td["ALBUMARTIST"][0] + assert not _COLLAB_RE.search(aa), ( + f"{fixture_name} t{track.number}: album MBID written for " + f"collab ALBUMARTIST {aa!r}" + ) From d61cd4cad6e686e7b7c8f56fcc0cff23363dab1d Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Sun, 12 Apr 2026 16:56:50 +0000 Subject: [PATCH 2/2] chore(deps): update pillow requirement from >=10.0 to >=12.2.0 Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) to permit the latest version. - [Release notes](https://github.com/python-pillow/Pillow/releases) - [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst) - [Commits](https://github.com/python-pillow/Pillow/compare/10.0.0...12.2.0) --- updated-dependencies: - dependency-name: pillow dependency-version: 12.2.0 dependency-type: direct:production ... Signed-off-by: dependabot[bot] --- pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/pyproject.toml b/pyproject.toml index ccb1f3b..d39116c 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -6,7 +6,7 @@ readme = "README.md" license = { text = "GPL-3.0-only" } requires-python = ">=3.11" dependencies = [ - "Pillow>=10.0", + "Pillow>=12.2.0", "typer>=0.9", "mutagen>=1.47", "rich>=13.0",