Skip to content

sudoStacks/retreivr-community-cache

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

35 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Retreivr Community Cache

This repository is a transport index dataset for Retreivr.

It stores mappings from canonical MusicBrainz recording MBIDs to known-good transport identifiers.

Scope

Canonical mapping model:

recording_mbid -> transport sources

Examples of transport identifiers:

  • YouTube video IDs
  • SoundCloud track IDs (future)
  • Other supported transport IDs (future)

MusicBrainz remains the authoritative source of metadata. This repository does not replicate MusicBrainz entity metadata.

Data Layout

Current dataset namespace:

  • youtube/recording/<prefix>/<recording_mbid>.json

Where:

  • prefix is the first two characters of recording_mbid
  • filename stem equals recording_mbid

Record Model

Each record contains:

  • recording_mbid
  • schema_version
  • updated_at
  • sources[] with transport candidate identifiers and verification fields

See schema/schema.json for the strict record contract.

Non-Goals

This repository must not contain:

  • scraped metadata dumps
  • platform search result dumps
  • thumbnails
  • ranking heuristics
  • MusicBrainz entity metadata copies
  • media files or download URLs

CI Guarantees

Validation in .github/workflows/validate.yml enforces:

  • JSON parse validity for dataset files
  • JSON Schema compliance
  • shard-path and filename/MBID consistency
  • duplicate MBID prevention in namespace
  • duplicate video_id prevention within a recording file
  • preview of derived dataset stats during CI

Derived stats are maintained automatically on main by .github/workflows/update_stats.yml. Publish PRs are validated against the dataset contract itself; stats/dataset.json is regenerated after merges instead of blocking automated publisher PRs.

Trusted PR automation in .github/workflows/trusted_pr_automerge.yml enables auto-merge for same-repo pull requests opened by publishers listed in .github/trusted_publishers.txt, once required checks pass. Additional publish policy lives in .github/publish_policy.json, including the minimum source confidence floor enforced by CI.

Trusted Publisher Access

Trusted publisher status controls who can use the fully automated same-repo PR and auto-merge flow.

If you want to become a trusted publisher:

  1. Open a GitHub Issue in this repository.
  2. Title it Trusted Publisher Request: <your-github-username>.
  3. Include:
    • your GitHub username
    • how you run Retreivr
    • whether you are publishing from a personal server, shared instance, or test node
    • links to any prior good publish PRs if you have them
    • anything relevant about how you validate your node outputs
  4. Wait for maintainers to review and, if approved, add your GitHub username to .github/trusted_publishers.txt.

Until then:

  • you can still run Retreivr and generate publish proposals
  • maintainers may still review PRs manually
  • auto-merge is reserved for approved trusted publishers

Purpose

The dataset accelerates transport resolution for Retreivr clients while keeping output deterministic, lightweight, and Git-native.

Releases

No releases published

Packages

 
 
 

Contributors