Skip to content

Smart Pre-Processing Pipeline: Validation, Triage, and Corrections #110

@deucebucket

Description

@deucebucket

Summary

A comprehensive pre-processing system that validates files, triages folders, and pushes corrections - all while preserving full automation.

Core philosophy: Don't trust garbage folders, DO trust audio. And if we screw up, fix it.


Status

Part Feature Status
1 File Validation Done (merged in #179)
2 Folder Triage This issue
3 Push Corrections Split to separate issue (blocked on Skaldleita)

Part 2: Folder Triage (Local, Fast)

Categorize folders by "cleanliness" to decide processing strategy.

Detection patterns

Classify each folder as:

  • Clean - Normal Author/Title structure, use path hints normally
  • Messy - Scene tags, torrent markers, quality indicators - skip path parsing, trust audio only
  • Garbage - Hash names, numbers only, generic folder names - skip path parsing, expect difficulty

Pattern examples

Category Examples
Messy {mb}, [FLAC], 64kbps, -TEAM suffix, .com in name
Garbage a3f8b2c1d4e5, New Folder, tmp, numbers-only names

Processing strategy

Category Path Parsing Audio Processing Confidence Modifier
Clean Use as hints Normal None
Messy SKIP Audio-only None
Garbage SKIP Audio-only -10%

Implementation

  • Add triage function with configurable patterns
  • Integrate into scan pipeline (after file validation, before processing)
  • Show triage results on dashboard (clean/messy/garbage counts)
  • Setting: skip_messy_path_parsing (default: true)

Related

Suggested by: @Merijeek (triage idea), @deucebucket (corrections concept)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions