fix: Persist watch_folder_processed + honor Skaldleita server_notice (#208) by deucebucket · Pull Request #213 · deucebucket/library-manager

deucebucket · 2026-04-18T00:39:45Z

Closes #208

Summary

Two related fixes that stop the watch-folder retry loop that had one LM instance generating ~48% of Skaldleita `/match` traffic for days.

1. `watch_folder_processed` is now persistent — was an in-memory `set()`, wiped on every restart, so any file that couldn't move (unknown author, ambiguous match, move failure, mtime churn) got re-submitted every scan cycle forever.

2. LM now honors Skaldleita's `server_notice` field (added in deucebucket/skaldleita#129). When the server detects a retry loop it sends `{severity, code, message, action: abort_task, upgrade_url}`. LM logs it, and on `action=abort_task` stops retrying that file immediately.

Changes

`library_manager/database.py`

New `watch_folder_processed` table: `path` PK, `processed_at`, `outcome`, `error_message`. Created in `init_db()` so existing DBs auto-migrate on next startup.
`watch_folder_is_processed(path)` — `SELECT 1 ... LIMIT 1` existence check.
`watch_folder_mark_processed(path, outcome, error_message=None)` — `INSERT OR REPLACE`. `outcome` values: `moved`, `move_failed`, `aborted_by_server`.

`app.py`

Removed `watch_folder_processed = set()`. Removed it from the `global` list in `process_watch_folder`.
`get_watch_folder_items` dedup check swapped from `if item_path in watch_folder_processed` to `if watch_folder_is_processed(item_path)`.
Success path calls `watch_folder_mark_processed(item_path, 'moved')`.
Move-failure path calls `watch_folder_mark_processed(item_path, 'move_failed', error)`.
New abort check after the identify chain: reads `get_and_clear_server_abort()`, and on hit marks the item `aborted_by_server` and `continue`s.

`library_manager/providers/bookdb.py`

Added `threading.local()` slot for the server-abort signal — scope stays in the thread that got the notice (watch worker vs. API endpoint vs. pipeline layer don't cross-contaminate).
In `search_bookdb`, right after `data = resp.json()`: logs every `server_notice` (with severity, code, message, upgrade URL), and on `action=abort_task` stashes the notice in the thread-local.
New `get_and_clear_server_abort()` accessor for callers.

Behavior matrix

Situation	Before	After
File fails to move, LM restarts	Loop resumes forever	File already marked; skipped
Unknown author flagged for review	Re-processed every scan after restart	Marked `move_failed` (via the `needs_attention` path), skipped on next scan
Skaldleita sends `abort_task`	Ignored, 30s retry loop continues	Marked `aborted_by_server`, loop stops immediately; upgrade URL in logs
Normal successful move	Worked	Works; marked `moved`

Test plan

`python3 -m py_compile app.py library_manager/database.py library_manager/providers/bookdb.py` → SYNTAX OK
`ruff check app.py library_manager/database.py library_manager/providers/bookdb.py --select=F821,E9,F63,F7,F82` → All checks passed
`venv/bin/python test-env/test-naming-issues.py` → 290 passed, 0 failed
Live test on maintainer sandbox or live LM: drop a file in watch, let it process (or fail), restart LM, confirm it is not re-picked up.
Manual verification that an existing DB auto-migrates: `sqlite3 library.db '.schema watch_folder_processed'` after a startup should print the new table.
Confirm Docker `:latest` build still works on merge to main once this lands.

Notes / follow-ups

`identify_audio_with_bookdb` (the audio-identification path) doesn't read `server_notice` yet. Audio requests don't usually retry-loop the way `/match` did, but if we see server-side evidence of that changing we can extend the handler there too.
UI surfacing of the `server_notice` (banner + upgrade link) is explicitly deferred to a separate task — the original issue called it "low priority relative to (1) and (2)" and I kept this PR scoped to the server/dedup fix.

References

Skaldleita server-side mitigation: deucebucket/skaldleita#129
Skaldleita tracking issue: deucebucket/skaldleita#128

…tice Problem 1: watch_folder_processed was an in-memory set() that got wiped on every restart. Any file that couldn't be processed (unknown author, ambiguous match, move failure, mtime churn) got re-submitted every scan cycle after a restart, forever. One LM instance generated ~48% of all Skaldleita /match traffic for days on the same filename. Problem 2: Skaldleita PR #129 added a server_notice field to /match responses. When the server detects a retry loop it now sends {severity, code, message, action: abort_task, upgrade_url}. LM ignored it. Fixes: - New watch_folder_processed SQLite table (path PK, processed_at, outcome, error_message). outcome in {moved, move_failed, aborted_by_server}. - watch_folder_is_processed() / watch_folder_mark_processed() helpers in library_manager/database.py. process_watch_folder swapped from set ops to these helpers. Restart no longer resets dedup. - bookdb.py logs every server_notice (with upgrade_url). On action=abort_task it stashes the notice in a threading.local() slot so scope is per-thread (watch worker, API endpoint, pipeline layer don't cross-contaminate). - process_watch_folder reads the abort slot after each identify attempt; if set, marks the item as aborted_by_server and skips the pipeline. Bumps APP_VERSION to 0.9.0-beta.148.

bucket-agent

🔍 Vibe Check Review — PR #213

Verdict: APPROVE · Scope: SCOPE_OK · Docs: ✅ CHANGELOG + README updated

Root-cause fix for Issue #208. Persistent SQLite dedup replaces the in-memory set() that was re-submitting the same failing filename every 30s across restarts (~48% of Skaldleita /match traffic from one instance). Server server_notice / action=abort_task now short-circuits the retry loop via a thread-local signal.

Verified:

New watch_folder_is_processed / watch_folder_mark_processed follow existing database.py patterns (raw sqlite3.connect(timeout=30) + try/finally, parameterized queries). WAL persists database-wide from get_db().
All three watch_folder_processed call-sites in app.py are migrated; global declaration trimmed.
threading.local() in bookdb.py is safe — worker.py runs scan and watch in separate threads, so abort state can't cross-contaminate. get_and_clear_server_abort() read-and-clear contract is correct.
Notice parsing placed before the confidence < 0.5 early return in search_bookdb, so low-confidence responses still carry the abort.
Abort check sits between API lookup and move_to_output_folder, exactly where you want it.

No blocking issues. Only nit (not filed): all notices log at WARNING regardless of severity field — trivial cleanup later, not PR-blocking.

Review saved to pr_213_review.md.

bucket-agent Bot reviewed Apr 18, 2026

View reviewed changes

deucebucket merged commit 6cd29e3 into develop Apr 18, 2026
3 checks passed

deucebucket deleted the fix/issue-208-watch-folder-persistence branch April 18, 2026 00:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: Persist watch_folder_processed + honor Skaldleita server_notice (#208)#213

fix: Persist watch_folder_processed + honor Skaldleita server_notice (#208)#213
deucebucket merged 1 commit intodevelopfrom
fix/issue-208-watch-folder-persistence

deucebucket commented Apr 18, 2026

Uh oh!

bucket-agent Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

deucebucket commented Apr 18, 2026

Summary

Changes

`library_manager/database.py`

`app.py`

`library_manager/providers/bookdb.py`

Behavior matrix

Test plan

Notes / follow-ups

References

Uh oh!

bucket-agent Bot left a comment

Choose a reason for hiding this comment

🔍 Vibe Check Review — PR #213

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant