From f27186c0d595eb933b88946df145b5091e280b78 Mon Sep 17 00:00:00 2001
From: Rouzax <GitHub@mgdn.nl>
Date: Sun, 12 Apr 2026 16:55:38 +0000
Subject: [PATCH 1/2] fix(tags): correct MBID tagging and case-normalize
 per-track artists

Album-artist MBID was being written as per-track MUSICBRAINZ_ARTISTID,
which Lyrion uses to dedupe contributors, collapsing every track in a
set to one row. Move to Picard-canonical MUSICBRAINZ_ALBUMARTISTID and
suppress entirely for B2B/collab album artists (a single MBID cannot
identify two performers). Case-normalize per-track artists against the
album artist as defense-in-depth for sources not normalized upstream.

Added parameterized invariant tests over 79 real MKV sidecars.
---
 CHANGELOG.md                  |   8 ++
 README.md                     |   2 +-
 docs/output.md                |  10 ++-
 pyproject.toml                |   2 +-
 src/tracksplit/metadata.py    |  53 ++++++++-----
 src/tracksplit/tagger.py      |  31 +++++++-
 tests/test_metadata.py        |  58 ++++++++++++++-
 tests/test_tagger.py          | 108 ++++++++++++++++++++++++++-
 tests/test_tagger_fixtures.py | 135 ++++++++++++++++++++++++++++++++++
 9 files changed, 382 insertions(+), 25 deletions(-)
 create mode 100644 tests/test_tagger_fixtures.py

diff --git a/CHANGELOG.md b/CHANGELOG.md
index 10740c4..5fd584d 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -5,6 +5,14 @@ All notable changes to this project will be documented in this file.
 The format is based on [Keep a Changelog](https://keepachangelog.com/),
 and this project adheres to [Semantic Versioning](https://semver.org/).
 
+## [0.5.1] - 2026-04-12
+
+### Fixed
+
+- Per-track artist display in Lyrion/LMS: the album-artist MusicBrainz ID was being written as the per-track `MUSICBRAINZ_ARTISTID`, causing LMS to dedupe all tracks to a single contributor row and show the first track's artist for every row. The MBID now goes to `MUSICBRAINZ_ALBUMARTISTID` (Picard-canonical), and the per-track key is never written. Jellyfin display is unchanged by this fix (it dedupes by name, not MBID).
+- Per-track artists whose case-insensitive form equals the album artist are now normalized to the album artist's casing (e.g. "AFROJACK - ID" with album artist "Afrojack" → `ARTIST=Afrojack`). Prevents duplicate contributor rows in Lyrion and stray upper/lowercase variants in Jellyfin. Applied as defense-in-depth so tier-1 sources and un-cached artists still get clean output.
+- Album-artist MBID is now suppressed for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would merge the collab album into that member's solo discography.
+
 ## [0.5.0] - 2026-04-12
 
 First release with a proper project presence: a hero README, a published docs site, an animated landing page, CI, and a rounded-out CLI UX.
diff --git a/README.md b/README.md
index 6bf2e73..c37f666 100644
--- a/README.md
+++ b/README.md
@@ -44,7 +44,7 @@ It pairs naturally with [CrateDigger](https://github.com/Rouzax/CrateDigger), wh
 
 - **Chapter-accurate splitting.** Sample-accurate cuts at chapter boundaries, gapless playback across tracks.
 - **Codec-aware output.** FLAC for lossless sources, Opus stream-copy when safe, transparent re-encode when not. Pick with `--format`.
-- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments.
+- **Rich metadata.** Writes TITLE, ARTIST, ALBUMARTIST, ALBUM, TRACKNUMBER, TRACKTOTAL, DISCNUMBER, DATE, GENRE, PUBLISHER, COMMENT, MUSICBRAINZ_ALBUMARTISTID, FESTIVAL, STAGE, VENUE as Vorbis comments.
 - **Album and artist artwork.** Generates 1:1 cover art (embedded in every track and written to `cover.jpg` / `folder.jpg`) and an artist folder image.
 - **Two metadata tiers.** Basic tagging for any chaptered video. Full enrichment when CrateDigger-style tags are present.
 - **Re-run detection.** A manifest in each album folder tracks chapter hashes and source mtime, so repeat runs on the same library are near-instant.
diff --git a/docs/output.md b/docs/output.md
index e93ef31..f710076 100644
--- a/docs/output.md
+++ b/docs/output.md
@@ -38,10 +38,18 @@ The album folder name depends on the metadata tier (see below). With full CrateD
 
 Vorbis comments written on every track:
 
-`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`.
+`TITLE`, `ARTIST`, `ALBUMARTIST`, `ALBUM`, `TRACKNUMBER`, `TRACKTOTAL`, `DISCNUMBER`, `DATE`, `GENRE`, `PUBLISHER`, `COMMENT`, `MUSICBRAINZ_ALBUMARTISTID`, `FESTIVAL`, `STAGE`, `VENUE`.
 
 Most servers only read the common fields (TITLE/ARTIST/ALBUM/TRACKNUMBER/DATE). The custom `FESTIVAL`, `STAGE`, `VENUE` fields preserve festival context for scripts, filters, or smart playlists that care.
 
+### Artist tagging policy
+
+- `ARTIST` is per-track (the performer of that chapter's track). When a chapter title has no "Artist - Title" separator, `ARTIST` falls back to `ALBUMARTIST`.
+- `ALBUMARTIST` is always the set's headliner (the album-level artist).
+- Per-track artists whose case-insensitive form equals `ALBUMARTIST` are normalized to the `ALBUMARTIST` casing, so "AFROJACK - ID" becomes `ARTIST=Afrojack` when the set is by "Afrojack". This prevents Lyrion from listing two contributor rows and prevents Jellyfin from picking up stray upper/lowercase variants.
+- `MUSICBRAINZ_ALBUMARTISTID` holds the album artist's MusicBrainz ID. It is omitted for B2B/collab album artists ("X & Y", "X vs. Y", "X x Y"): a single MBID cannot identify two performers, and emitting only one half's MBID would cause media servers to merge the collab album into that member's solo discography.
+- `MUSICBRAINZ_ARTISTID` (the per-track MBID key) is never written. TrackSplit has no per-track-artist MBIDs; writing the album-artist MBID there caused Lyrion to collapse every track to a single contributor row.
+
 ## Metadata tiers
 
 - **Tier 1 (basic):** any chaptered video. TrackSplit infers artist and album from the filename and embedded tags, numbers tracks, and writes whatever metadata it can find.
diff --git a/pyproject.toml b/pyproject.toml
index 2745d4c..ccb1f3b 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -1,6 +1,6 @@
 [project]
 name = "tracksplit"
-version = "0.5.0"
+version = "0.5.1"
 description = "Extract audio from video chapters into FLAC music albums"
 readme = "README.md"
 license = { text = "GPL-3.0-only" }
diff --git a/src/tracksplit/metadata.py b/src/tracksplit/metadata.py
index aca0157..0d0fb6b 100644
--- a/src/tracksplit/metadata.py
+++ b/src/tracksplit/metadata.py
@@ -119,23 +119,8 @@ def build_album_meta(
     Tier 2: album = "Festival Year (Stage)" with full tag data.
     Tier 1: album = filename_stem, artist/date parsed from filename.
     """
-    # Strip labels, then split artist from title
-    clean_titles = []
-    track_artists = []
-    publishers = []
-    for ch in chapters:
-        title, label = strip_label(ch.title)
-        track_artist, track_title = split_track_artist(title)
-        clean_titles.append(track_title)
-        track_artists.append(track_artist)
-        publishers.append(label)
-
-    # Deduplicate titles
-    clean_titles = deduplicate_titles(clean_titles)
-
-    # Get genres from tags
-    genres = tags.get("genres", [])
-
+    # Resolve album-level fields up front so we can case-normalize per-track
+    # artists against the album artist below (see loop).
     if tier == 2:
         artist = tags.get("artist", "")
         festival = tags.get("festival", "")
@@ -156,6 +141,40 @@ def build_album_meta(
         date = year
         album = filename_stem
 
+    # Strip labels, then split artist from title.
+    #
+    # Defense-in-depth: if a chapter's per-track artist matches the album
+    # artist case-insensitively (e.g. chapter "AFROJACK - ID" with album
+    # ARTIST "Afrojack"), normalize to the album artist's canonical casing.
+    # Without this, Lyrion treats "AFROJACK" and "Afrojack" as two separate
+    # contributors, and Jellyfin collapses them but keeps the first-scanned
+    # casing as the display name. CrateDigger ideally normalizes upstream,
+    # but 40% of DJs are missing from its cache and tier-1 (non-CrateDigger)
+    # sources make no such guarantee, so we apply the cheap local fix here.
+    # Whole-string match only: "AFROJACK & Steve Aoki" stays as-is because
+    # that's a genuinely different contributor string.
+    clean_titles = []
+    track_artists = []
+    publishers = []
+    for ch in chapters:
+        title, label = strip_label(ch.title)
+        track_artist, track_title = split_track_artist(title)
+        if (
+            track_artist
+            and artist
+            and track_artist.casefold() == artist.casefold()
+        ):
+            track_artist = artist
+        clean_titles.append(track_title)
+        track_artists.append(track_artist)
+        publishers.append(label)
+
+    # Deduplicate titles
+    clean_titles = deduplicate_titles(clean_titles)
+
+    # Get genres from tags
+    genres = tags.get("genres", [])
+
     # Build tracks
     tracks = []
     for i, ch in enumerate(chapters):
diff --git a/src/tracksplit/tagger.py b/src/tracksplit/tagger.py
index fe21d14..70ea7c5 100644
--- a/src/tracksplit/tagger.py
+++ b/src/tracksplit/tagger.py
@@ -3,6 +3,7 @@
 
 import base64
 import logging
+import re
 from collections.abc import Callable, Sequence
 from pathlib import Path
 
@@ -11,10 +12,36 @@
 
 from tracksplit.models import AlbumMeta, TrackMeta
 
+# Matches collab separators in an album-artist string. A single MBID cannot
+# identify two performers, so we suppress MUSICBRAINZ_ALBUMARTISTID when any
+# of these appear as whitespace-delimited tokens: "X & Y", "X | Y", "X vs Y"
+# (with or without trailing dot), "X x Y". Whitespace-delimited so names like
+# "Axwell", "deadmau5", or "Eric Prydz" do not false-positive.
+_COLLAB_SEPARATOR_RE = re.compile(r"\s(?:&|\||vs\.?|x)\s", re.IGNORECASE)
+
+
+def _is_collab_artist(artist: str) -> bool:
+    return bool(_COLLAB_SEPARATOR_RE.search(artist))
+
 
 def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]:
     """Build a Vorbis comment dict from album and track metadata.
 
+    Tag policy (single source of truth for both FLAC and OggOpus):
+
+    - ``TITLE`` / ``ARTIST``: per-track. ``ARTIST`` falls back to ``album.artist``
+      when the chapter title had no "Artist - Title" separator.
+    - ``ALBUMARTIST``: always the album-level artist (the set headliner).
+    - ``MUSICBRAINZ_ALBUMARTISTID``: the album artist's MusicBrainz ID, under the
+      Picard-canonical key. Suppressed for B2B/collab album artists (those
+      containing "&", "|", "vs.", or " x ") because a single MBID cannot
+      identify two performers; writing it anyway would merge the collab album
+      into one member's solo discography in LMS/Jellyfin.
+    - ``MUSICBRAINZ_ARTISTID`` (per-track MBID) is **never** emitted: TrackSplit
+      has no per-track-artist MBIDs. Writing the album-artist MBID here (the
+      pre-fix behavior) caused Lyrion to dedupe every track to a single
+      contributor row.
+
     All values are lists of strings per the Vorbis comment specification.
     Optional tags are omitted when their source value is empty.
     """
@@ -45,8 +72,8 @@ def build_tag_dict(album: AlbumMeta, track: TrackMeta) -> dict[str, list[str]]:
     if album.comment:
         tags["COMMENT"] = [album.comment]
 
-    if album.musicbrainz_artistid:
-        tags["MUSICBRAINZ_ARTISTID"] = [album.musicbrainz_artistid]
+    if album.musicbrainz_artistid and not _is_collab_artist(album.artist):
+        tags["MUSICBRAINZ_ALBUMARTISTID"] = [album.musicbrainz_artistid]
 
     if album.festival:
         tags["FESTIVAL"] = [album.festival]
diff --git a/tests/test_metadata.py b/tests/test_metadata.py
index 1a6fa62..dcbce80 100644
--- a/tests/test_metadata.py
+++ b/tests/test_metadata.py
@@ -333,6 +333,62 @@ def test_probe_to_metadata_to_tagger_contract():
     assert tag_dict["STAGE"] == ["Mainstage"]
     assert tag_dict["VENUE"] == ["Boom"]
     assert tag_dict["COMMENT"] == ["https://1001tl.com/abc"]
-    assert tag_dict["MUSICBRAINZ_ARTISTID"] == ["uuid-456"]
+    assert tag_dict["MUSICBRAINZ_ALBUMARTISTID"] == ["uuid-456"]
     assert tag_dict["PUBLISHER"] == ["Spinnin"]
     assert tag_dict["TITLE"] == ["Animals"]
+
+
+# --- Per-track artist case normalization (defense-in-depth) ---
+
+def test_track_artist_case_normalized_uppercase_chapter():
+    """Chapter 'AFROJACK - ID' with album ARTIST 'Afrojack' → normalized."""
+    tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"}
+    chapters = _make_chapters(["AFROJACK - ID", "AFROJACK - Bringin It Back"])
+    meta = build_album_meta(tags, chapters, "", tier=2)
+    assert meta.tracks[0].artist == "Afrojack"
+    assert meta.tracks[1].artist == "Afrojack"
+
+
+def test_track_artist_case_normalized_lowercase_chapter():
+    """Chapter 'deadmau5 - Strobe' with album ARTIST 'Deadmau5' → normalized."""
+    tags = {"artist": "Deadmau5", "festival": "Tomorrowland Brasil", "date": "2025"}
+    chapters = _make_chapters(["deadmau5 - Strobe"])
+    meta = build_album_meta(tags, chapters, "", tier=2)
+    assert meta.tracks[0].artist == "Deadmau5"
+
+
+def test_track_artist_preserved_when_not_whole_match():
+    """Multi-artist strings containing the album artist stay as-is."""
+    tags = {"artist": "Afrojack", "festival": "EDC", "date": "2025-05-17"}
+    chapters = _make_chapters([
+        "AFROJACK & Steve Aoki ft. Miss Palmer - No Beef",
+        "AFROJACK & Martin Garrix - Turn Up The Speakers",
+    ])
+    meta = build_album_meta(tags, chapters, "", tier=2)
+    assert meta.tracks[0].artist == "AFROJACK & Steve Aoki ft. Miss Palmer"
+    assert meta.tracks[1].artist == "AFROJACK & Martin Garrix"
+
+
+def test_track_artist_empty_when_chapter_has_no_separator():
+    """Chapter 'Intro' (no ' - ') → track.artist stays empty, falls back later."""
+    tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025"}
+    chapters = _make_chapters(["Intro", "ID"])
+    meta = build_album_meta(tags, chapters, "", tier=2)
+    assert meta.tracks[0].artist == ""
+    assert meta.tracks[1].artist == ""
+
+
+def test_unicode_artist_preserved_through_build_album_meta():
+    """Diacritics in artist names must survive the pipeline untouched."""
+    tags = {"artist": "Tiësto", "festival": "EDC", "date": "2025-05-17"}
+    chapters = _make_chapters([
+        "RÜFÜS DU SOL - Innerbloom",
+        "Kölsch - Grey",
+        "Amél - Birds Of A Feather",
+    ])
+    meta = build_album_meta(tags, chapters, "", tier=2)
+    assert meta.artist == "Tiësto"
+    assert meta.tracks[0].artist == "RÜFÜS DU SOL"
+    assert meta.tracks[1].artist == "Kölsch"
+    assert meta.tracks[2].artist == "Amél"
+    assert meta.tracks[2].title == "Birds Of A Feather"
diff --git a/tests/test_tagger.py b/tests/test_tagger.py
index 7a17712..b010156 100644
--- a/tests/test_tagger.py
+++ b/tests/test_tagger.py
@@ -53,7 +53,7 @@ def test_build_tag_dict_all_fields():
     assert tags["GENRE"] == ["Trance"]
     assert tags["PUBLISHER"] == ["Armada Music"]
     assert tags["COMMENT"] == ["Full set recording"]
-    assert tags["MUSICBRAINZ_ARTISTID"] == ["test-mbid-1234"]
+    assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"]
     assert tags["FESTIVAL"] == ["Ultra Music Festival"]
     assert tags["STAGE"] == ["Mainstage"]
     assert tags["VENUE"] == ["Bayfront Park"]
@@ -85,7 +85,7 @@ def test_build_tag_dict_minimal():
     # Optional tags absent
     for key in (
         "TRACKTOTAL", "DATE", "GENRE", "PUBLISHER", "COMMENT",
-        "MUSICBRAINZ_ARTISTID", "FESTIVAL", "STAGE", "VENUE",
+        "MUSICBRAINZ_ALBUMARTISTID", "FESTIVAL", "STAGE", "VENUE",
     ):
         assert key not in tags, f"{key} should not be present when empty"
 
@@ -152,3 +152,107 @@ def test_tag_all_dispatches_by_extension():
 
     mock_flac.assert_called_once()
     mock_ogg.assert_called_once()
+
+
+# --- MBID policy: no per-track MBID, collab guard ---
+
+def test_no_musicbrainz_artistid_emitted():
+    """The per-track MBID key must never be written (regression guard).
+
+    Writing album-artist MBID as per-track MBID caused Lyrion to collapse
+    every track to a single contributor row. We never have real per-track
+    MBIDs, so the key stays out of the dict entirely.
+    """
+    album = _full_album()
+    tags = build_tag_dict(album, album.tracks[0])
+    assert "MUSICBRAINZ_ARTISTID" not in tags
+
+
+def test_albumartist_mbid_written_for_solo_artist():
+    album = _full_album()  # artist="Armin van Buuren", MBID="test-mbid-1234"
+    tags = build_tag_dict(album, album.tracks[0])
+    assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["test-mbid-1234"]
+
+
+def test_albumartist_mbid_suppressed_for_ampersand_collab():
+    """'X & Y' album artists have no single-person MBID; don't write one."""
+    album = AlbumMeta(
+        artist="Armin van Buuren & KI/KI",
+        album="AMF 2025 (Two Is One)",
+        musicbrainz_artistid="477b8c0c-c5fc-4ad2-b5b2-191f0bf2a9df",
+        tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
+    )
+    tags = build_tag_dict(album, album.tracks[0])
+    assert "MUSICBRAINZ_ALBUMARTISTID" not in tags
+
+
+def test_albumartist_mbid_suppressed_for_vs_collab():
+    album = AlbumMeta(
+        artist="Armin van Buuren vs. Hardwell",
+        album="Collab Set",
+        musicbrainz_artistid="some-mbid",
+        tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
+    )
+    tags = build_tag_dict(album, album.tracks[0])
+    assert "MUSICBRAINZ_ALBUMARTISTID" not in tags
+
+
+def test_albumartist_mbid_suppressed_for_x_collab():
+    album = AlbumMeta(
+        artist="Martin Garrix x Alesso",
+        album="Collab Set",
+        musicbrainz_artistid="some-mbid",
+        tracks=[TrackMeta(number=1, title="Track", start=0.0, end=60.0)],
+    )
+    tags = build_tag_dict(album, album.tracks[0])
+    assert "MUSICBRAINZ_ALBUMARTISTID" not in tags
+
+
+def test_collab_guard_does_not_false_positive_on_embedded_letters():
+    """Names like 'Axwell', 'deadmau5', 'Eric Prydz' must not trip the guard."""
+    for name in ("Axwell", "deadmau5", "Eric Prydz", "Tiësto", "R3HAB"):
+        album = AlbumMeta(
+            artist=name,
+            album="Set",
+            musicbrainz_artistid="mbid-abc",
+            tracks=[TrackMeta(number=1, title="T", start=0.0, end=60.0)],
+        )
+        tags = build_tag_dict(album, album.tracks[0])
+        assert tags["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-abc"], f"false positive on {name!r}"
+
+
+def test_opus_round_trip_preserves_unicode_tags(tmp_path):
+    """Write an opus, tag with unicode, read it back, expect exact strings."""
+    import shutil
+    import subprocess
+
+    ffmpeg = shutil.which("ffmpeg")
+    if ffmpeg is None:
+        import pytest
+        pytest.skip("ffmpeg not available")
+
+    opus_path = tmp_path / "test.opus"
+    subprocess.run(
+        [ffmpeg, "-f", "lavfi", "-i", "anullsrc=r=48000:cl=stereo",
+         "-t", "0.5", "-c:a", "libopus", "-b:a", "32k",
+         str(opus_path), "-y", "-loglevel", "error"],
+        check=True,
+    )
+
+    album = AlbumMeta(
+        artist="Tiësto",
+        album="EDC",
+        musicbrainz_artistid="mbid-ti",
+        tracks=[TrackMeta(number=1, title="Strobe", start=0.0, end=30.0,
+                          artist="RÜFÜS DU SOL")],
+    )
+    from tracksplit.tagger import tag_ogg
+    from mutagen.oggopus import OggOpus
+
+    tag_ogg(opus_path, album, album.tracks[0])
+    reread = OggOpus(str(opus_path))
+
+    assert reread["ARTIST"] == ["RÜFÜS DU SOL"]
+    assert reread["ALBUMARTIST"] == ["Tiësto"]
+    assert reread["MUSICBRAINZ_ALBUMARTISTID"] == ["mbid-ti"]
+    assert "MUSICBRAINZ_ARTISTID" not in reread
diff --git a/tests/test_tagger_fixtures.py b/tests/test_tagger_fixtures.py
new file mode 100644
index 0000000..bd0f444
--- /dev/null
+++ b/tests/test_tagger_fixtures.py
@@ -0,0 +1,135 @@
+"""Parameterized invariant tests over the real CrateDigger MKV corpus.
+
+Runs only when the local fixture directory is present. Every JSON sidecar
+in the dump exercises the full path:
+
+    ffprobe tags -> parse_tags -> build_album_meta -> build_tag_dict
+
+and asserts invariants that must hold for every set. Gives us continuous
+regression coverage against real data without shipping the corpus.
+"""
+from __future__ import annotations
+
+import json
+import re
+from pathlib import Path
+
+import pytest
+
+from tracksplit.metadata import build_album_meta
+from tracksplit.models import Chapter
+from tracksplit.tagger import build_tag_dict
+
+DUMP_DIR = Path("/home/martijn/_temp/cratedigger/data/mkv-info-dump")
+
+pytestmark = pytest.mark.skipif(
+    not DUMP_DIR.is_dir(),
+    reason=f"MKV dump corpus not present at {DUMP_DIR}",
+)
+
+# Same regex as tagger._COLLAB_SEPARATOR_RE, duplicated here so the test
+# breaks loudly if the two drift.
+_COLLAB_RE = re.compile(r"\s(?:&|\||vs\.?|x)\s", re.IGNORECASE)
+
+
+def _fixture_ids() -> list[str]:
+    if not DUMP_DIR.is_dir():
+        return []
+    return sorted(p.name for p in DUMP_DIR.glob("*.json"))
+
+
+def _tags_from_extra(extra: dict) -> dict:
+    """Translate the MKV `extra` block into the dict shape parse_tags produces."""
+    genres_raw = extra.get("CRATEDIGGER_1001TL_GENRES", "")
+    return {
+        "artist": extra.get("ARTIST", ""),
+        "festival": extra.get("CRATEDIGGER_1001TL_FESTIVAL", ""),
+        "date": extra.get("CRATEDIGGER_1001TL_DATE", ""),
+        "stage": extra.get("CRATEDIGGER_1001TL_STAGE", ""),
+        "venue": extra.get("CRATEDIGGER_1001TL_VENUE", ""),
+        "genres": [g for g in genres_raw.split("|") if g],
+        "comment": extra.get("CRATEDIGGER_1001TL_URL", ""),
+        "musicbrainz_artistid": extra.get("CRATEDIGGER_MBID", ""),
+    }
+
+
+def _chapters_from_menu(menu_track: dict) -> list[Chapter]:
+    """Extract chapters from the MediaInfo Menu track structure."""
+    chapters: list[Chapter] = []
+    extra = menu_track.get("extra", {})
+    # Keys are timecodes like "_00_00_00000"; values are strings like "en:TITLE".
+    entries = sorted(
+        (k, v) for k, v in extra.items()
+        if isinstance(k, str) and k.startswith("_")
+    )
+    times: list[float] = []
+    titles: list[str] = []
+    for k, v in entries:
+        parts = k.lstrip("_").split("_")
+        if len(parts) < 3:
+            continue
+        h, m, rest = parts[0], parts[1], parts[2]
+        try:
+            seconds = int(h) * 3600 + int(m) * 60 + int(rest[:2]) + int(rest[2:].ljust(3, "0")[:3]) / 1000
+        except ValueError:
+            continue
+        if isinstance(v, str) and ":" in v:
+            v = v.split(":", 1)[1]
+        times.append(float(seconds))
+        titles.append(v)
+
+    if not times:
+        return []
+    for i, (start, title) in enumerate(zip(times, titles)):
+        end = times[i + 1] if i + 1 < len(times) else start + 1.0
+        chapters.append(Chapter(index=i + 1, title=title, start=start, end=end))
+    return chapters
+
+
+def _load_fixture(path: Path) -> tuple[dict, list[Chapter]] | None:
+    raw = json.loads(path.read_text(encoding="utf-8"))
+    tracks = raw.get("mediainfo", {}).get("media", {}).get("track", [])
+    general = next((t for t in tracks if t.get("@type") == "General"), None)
+    menu = next((t for t in tracks if t.get("@type") == "Menu"), None)
+    if not general or not menu:
+        return None
+    tags = _tags_from_extra(general.get("extra", {}))
+    chapters = _chapters_from_menu(menu)
+    if not chapters:
+        return None
+    return tags, chapters
+
+
+@pytest.mark.parametrize("fixture_name", _fixture_ids())
+def test_corpus_invariants(fixture_name):
+    loaded = _load_fixture(DUMP_DIR / fixture_name)
+    if loaded is None:
+        pytest.skip(f"{fixture_name}: no usable General+Menu tracks")
+    tags, chapters = loaded
+    stem = fixture_name.replace(".json", "")
+    meta = build_album_meta(tags, chapters, stem, tier=2)
+
+    assert meta.album, f"{fixture_name}: empty ALBUM"
+    assert meta.artist, f"{fixture_name}: empty ALBUMARTIST"
+
+    for track in meta.tracks:
+        td = build_tag_dict(meta, track)
+
+        # Required tags populated
+        assert td["TITLE"] and td["TITLE"][0], f"{fixture_name} t{track.number}: empty TITLE"
+        assert td["ARTIST"] and td["ARTIST"][0], f"{fixture_name} t{track.number}: empty ARTIST"
+        assert td["ALBUMARTIST"] and td["ALBUMARTIST"][0], f"{fixture_name} t{track.number}: empty ALBUMARTIST"
+
+        # Regression guard: the old tag key must never reappear
+        assert "MUSICBRAINZ_ARTISTID" not in td, (
+            f"{fixture_name} t{track.number}: MUSICBRAINZ_ARTISTID leaked back"
+        )
+
+        # Collab guard: when album-artist MBID is written, ALBUMARTIST must
+        # unambiguously identify a single performer.
+        if "MUSICBRAINZ_ALBUMARTISTID" in td:
+            aa = td["ALBUMARTIST"][0]
+            assert not _COLLAB_RE.search(aa), (
+                f"{fixture_name} t{track.number}: album MBID written for "
+                f"collab ALBUMARTIST {aa!r}"
+            )

From d61cd4cad6e686e7b7c8f56fcc0cff23363dab1d Mon Sep 17 00:00:00 2001
From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com>
Date: Sun, 12 Apr 2026 16:56:50 +0000
Subject: [PATCH 2/2] chore(deps): update pillow requirement from >=10.0 to
 >=12.2.0

Updates the requirements on [pillow](https://github.com/python-pillow/Pillow) to permit the latest version.
- [Release notes](https://github.com/python-pillow/Pillow/releases)
- [Changelog](https://github.com/python-pillow/Pillow/blob/main/CHANGES.rst)
- [Commits](https://github.com/python-pillow/Pillow/compare/10.0.0...12.2.0)

---
updated-dependencies:
- dependency-name: pillow
  dependency-version: 12.2.0
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
---
 pyproject.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/pyproject.toml b/pyproject.toml
index ccb1f3b..d39116c 100644
--- a/pyproject.toml
+++ b/pyproject.toml
@@ -6,7 +6,7 @@ readme = "README.md"
 license = { text = "GPL-3.0-only" }
 requires-python = ">=3.11"
 dependencies = [
-    "Pillow>=10.0",
+    "Pillow>=12.2.0",
     "typer>=0.9",
     "mutagen>=1.47",
     "rich>=13.0",