Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,34 @@

All notable changes to Library Manager will be documented in this file.

## [0.9.0-beta.148] - 2026-04-17

### Fixed

- **Issue #208: Watch-folder retry loop survives restarts** — The watch-folder
worker used an in-memory `set()` to remember which files it had already
processed. Every LM restart wiped the set, so whenever a file couldn't be
processed (unknown author, ambiguous match, move failure, mtime churn), the
worker would re-submit it on every scan forever. Server-side evidence showed
one LM instance generating ~48% of all Skaldleita `/match` traffic — 2,840
requests in a single day on the same filename. Fix:
- New `watch_folder_processed` SQLite table (`path`, `processed_at`,
`outcome`, `error_message`) persists dedup across restarts. `outcome`
values: `moved`, `move_failed`, `aborted_by_server`.
- Added `watch_folder_is_processed()` / `watch_folder_mark_processed()`
helpers in `library_manager/database.py`; watch worker switched from
`set()` ops to these helpers.
- **Issue #208: Skaldleita `server_notice` handler** — Skaldleita responses
can now carry a `server_notice` block (severity/code/message/action/
upgrade_url). `library_manager/providers/bookdb.py` logs every notice
(with upgrade URL) and, on `action=abort_task`, stashes it in a
`threading.local()` slot. The watch-folder worker reads that slot after
each identify attempt and, if an abort was signalled, marks the item as
`aborted_by_server` and skips the rest of the pipeline — no 30-second
retry loop.

---

## [0.9.0-beta.147] - 2026-04-17

### Fixed
Expand Down
6 changes: 5 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

**Smart Audiobook Library Organizer with Multi-Source Metadata & AI Verification**

[![Version](https://img.shields.io/badge/version-0.9.0--beta.147-blue.svg)](CHANGELOG.md)
[![Version](https://img.shields.io/badge/version-0.9.0--beta.148-blue.svg)](CHANGELOG.md)
[![Docker](https://img.shields.io/badge/docker-ghcr.io-blue.svg)](https://ghcr.io/deucebucket/library-manager)
[![License](https://img.shields.io/badge/license-AGPL--3.0-blue.svg)](LICENSE)

Expand All @@ -16,6 +16,10 @@

## Recent Changes (stable)

> **beta.148** - **Fix: Watch-Folder Retry Loop Across Restarts + Skaldleita server_notice** (Issue #208)
> - **Persistent watch-folder dedup** - `watch_folder_processed` is now a SQLite table instead of an in-memory `set()`. Restarts no longer wipe it, killing the retry loop that had one LM instance hammering Skaldleita's `/match` every 30 seconds on the same file for days.
> - **Honors Skaldleita's abort signal** - When the server detects a retry loop it sends a `server_notice` in the response. LM now logs it (with an upgrade URL) and, on `action=abort_task`, stops retrying that file immediately.

> **beta.147** - **Critical Fix: Hard Link Safety** (Issue #209)
> - **Stop silent copy+delete** - When "Use hard links" was enabled and the watch folder / library sat on different filesystems, LM used to copy every file and delete the originals. That broke torrent seeding and doubled disk use. Now LM fails fast with a clear error and leaves source files untouched.
> - **Pre-check filesystem compatibility** - Verifies `st_dev` match before any file operations when hard links are enabled.
Expand Down
33 changes: 23 additions & 10 deletions app.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
- Multi-provider AI (Gemini, OpenRouter, Ollama)
"""

APP_VERSION = "0.9.0-beta.147"
APP_VERSION = "0.9.0-beta.148"
GITHUB_REPO = "deucebucket/library-manager" # Your GitHub repo

# Versioning Guide:
Expand Down Expand Up @@ -52,7 +52,8 @@
from library_manager.database import (
init_db, get_db, set_db_path, cleanup_garbage_entries,
cleanup_duplicate_history_entries, insert_history_entry,
should_requeue_book
should_requeue_book,
watch_folder_is_processed, watch_folder_mark_processed
)
from library_manager.models.book_profile import (
SOURCE_WEIGHTS, FIELD_WEIGHTS, FieldValue, BookProfile,
Expand Down Expand Up @@ -737,7 +738,7 @@
try:
with open(ERROR_REPORTS_PATH, 'r') as f:
reports = json.load(f)
except:

Check failure on line 741 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (E722)

app.py:741:13: E722 Do not use bare `except`
reports = []

# Add new report (keep last 100 reports to avoid file bloat)
Expand All @@ -761,7 +762,7 @@
try:
with open(ERROR_REPORTS_PATH, 'r') as f:
return json.load(f)
except:

Check failure on line 765 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (E722)

app.py:765:9: E722 Do not use bare `except`
return []
return []

Expand Down Expand Up @@ -1716,7 +1717,7 @@
continue
result = call_gemini(prompt, merged_config)
if result:
logger.info(f"[PROVIDER CHAIN] Success with gemini")

Check failure on line 1720 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:1720:33: F541 f-string without any placeholders help: Remove extraneous `f` prefix
return result

elif provider == 'openrouter':
Expand All @@ -1725,13 +1726,13 @@
continue
result = call_openrouter(prompt, merged_config)
if result:
logger.info(f"[PROVIDER CHAIN] Success with openrouter")

Check failure on line 1729 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:1729:33: F541 f-string without any placeholders help: Remove extraneous `f` prefix
return result

elif provider == 'ollama':
result = call_ollama(prompt, merged_config)
if result:
logger.info(f"[PROVIDER CHAIN] Success with ollama")

Check failure on line 1735 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:1735:33: F541 f-string without any placeholders help: Remove extraneous `f` prefix
return result

else:
Expand Down Expand Up @@ -1833,7 +1834,7 @@
return result
elif result and result.get('transcript'):
# Got transcript but no match - still useful, return for potential AI fallback
logger.info(f"[AUDIO CHAIN] BookDB returned transcript only")

Check failure on line 1837 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:1837:37: F541 f-string without any placeholders help: Remove extraneous `f` prefix
return result
elif result is None and attempt < max_retries - 1:
# Connection might be down, wait and retry
Expand Down Expand Up @@ -2165,11 +2166,11 @@
device = "cuda"
# int8 works on all CUDA devices including GTX 1080 (compute 6.1)
# float16 only works on newer GPUs (compute 7.0+)
logger.info(f"[WHISPER] Using CUDA GPU acceleration (10x faster)")

Check failure on line 2169 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:2169:29: F541 f-string without any placeholders help: Remove extraneous `f` prefix
else:
logger.info(f"[WHISPER] Using CPU (no CUDA GPU detected)")

Check failure on line 2171 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:2171:29: F541 f-string without any placeholders help: Remove extraneous `f` prefix
except ImportError:
logger.info(f"[WHISPER] Using CPU (ctranslate2 not available)")

Check failure on line 2173 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (F541)

app.py:2173:25: F541 f-string without any placeholders help: Remove extraneous `f` prefix

_whisper_model = WhisperModel(model_name, device=device, compute_type=compute_type)
_whisper_model_name = model_name
Expand Down Expand Up @@ -2376,7 +2377,7 @@
if sample_path and os.path.exists(sample_path):
try:
os.unlink(sample_path)
except:

Check failure on line 2380 in app.py

View workflow job for this annotation

GitHub Actions / lint

ruff (E722)

app.py:2380:13: E722 Do not use bare `except`
pass

return result
Expand Down Expand Up @@ -6432,8 +6433,8 @@
# WATCH FOLDER FUNCTIONALITY
# ============================================================================

# Track processed watch folder items to avoid reprocessing
watch_folder_processed = set()
# Issue #208: watch-folder dedup now lives in the watch_folder_processed
# SQLite table (see library_manager.database) so restarts don't reset state.
watch_folder_last_scan = 0

def get_watch_folder_items(watch_folder: str, min_age_seconds: int = 30) -> list:
Expand All @@ -6456,8 +6457,8 @@
for item in watch_path.iterdir():
item_path = str(item.resolve())

# Skip if already processed
if item_path in watch_folder_processed:
# Skip if already processed (persisted in SQLite, Issue #208)
if watch_folder_is_processed(item_path):
continue

# Check if folder contains audio files or is an audio file
Expand Down Expand Up @@ -6668,7 +6669,7 @@
Process items in the watch folder.
Returns number of items processed.
"""
global watch_folder_processed, watch_folder_last_scan
global watch_folder_last_scan

watch_folder = config.get('watch_folder', '').strip()
output_folder = config.get('watch_output_folder', '').strip()
Expand Down Expand Up @@ -6828,6 +6829,18 @@
except Exception as e:
logger.debug(f"Watch folder: API lookup failed, using path analysis: {e}")

# Issue #208: Skaldleita may have signalled 'abort_task' during the
# lookup above (retry-loop protection). Stop retrying this item and
# persist it so future scans skip it until the user upgrades / fixes
# the source. The warning + upgrade URL are already in the logs.
from library_manager.providers.bookdb import get_and_clear_server_abort
server_abort = get_and_clear_server_abort()
if server_abort:
abort_msg = server_abort.get('message', 'Skaldleita requested task abort')
logger.warning(f"Watch folder: Aborting '{item.name}' per Skaldleita server notice")
watch_folder_mark_processed(item_path, 'aborted_by_server', abort_msg)
continue

# Issue #57: Verify drastic author changes before accepting
if needs_verification and api_author and api_title:
try:
Expand Down Expand Up @@ -6880,7 +6893,7 @@

if success:
logger.info(f"Watch folder: Moved to {new_path}")
watch_folder_processed.add(item_path)
watch_folder_mark_processed(item_path, 'moved')
processed += 1

# Add to books table
Expand Down Expand Up @@ -6914,8 +6927,8 @@
else:
logger.error(f"Watch folder: Failed to move {item.name}: {error}")
# Issue #49: Track failed items in the database so user can see and fix them
# Add to watch_folder_processed to prevent infinite retry loop
watch_folder_processed.add(item_path)
# Issue #208: persist dedup so the retry loop dies across restarts too
watch_folder_mark_processed(item_path, 'move_failed', error)
try:
# Check if this item is already tracked
c.execute('SELECT id FROM books WHERE path = ?', (item_path,))
Expand Down
53 changes: 53 additions & 0 deletions library_manager/database.py
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,17 @@ def init_db(db_path=None):
api_calls INTEGER DEFAULT 0
)''')

# Issue #208: Persistent watch-folder dedup
# Was an in-memory set(), wiped on restart, which caused the watch worker
# to re-submit the same failing file every cycle (ate ~48% of Skaldleita
# traffic from a single LM instance before server-side cache absorbed it).
c.execute('''CREATE TABLE IF NOT EXISTS watch_folder_processed (
path TEXT PRIMARY KEY,
processed_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
outcome TEXT,
error_message TEXT
)''')

conn.commit()
conn.close()

Expand All @@ -187,6 +198,48 @@ def init_db(db_path=None):
init_plugin_metrics_table(path)


def watch_folder_is_processed(path, db_path=None):
"""Return True if the watch-folder path has already been handled.

Issue #208: replaces the in-memory set. Survives restarts so the worker
doesn't re-submit the same failing file every scan cycle.
"""
p = db_path or _db_path
if not p:
return False
conn = sqlite3.connect(p, timeout=30)
try:
c = conn.execute(
'SELECT 1 FROM watch_folder_processed WHERE path = ? LIMIT 1',
(path,)
)
return c.fetchone() is not None
finally:
conn.close()


def watch_folder_mark_processed(path, outcome, error_message=None, db_path=None):
"""Record that a watch-folder path has been handled.

outcome: 'moved' | 'move_failed' | 'unknown_author' | 'aborted_by_server'
Issue #208.
"""
p = db_path or _db_path
if not p:
return
conn = sqlite3.connect(p, timeout=30)
try:
conn.execute(
'''INSERT OR REPLACE INTO watch_folder_processed
(path, processed_at, outcome, error_message)
VALUES (?, CURRENT_TIMESTAMP, ?, ?)''',
(path, outcome, error_message)
)
conn.commit()
finally:
conn.close()


def cleanup_garbage_entries(db_path=None):
"""Remove garbage entries from database on startup.

Expand Down
32 changes: 32 additions & 0 deletions library_manager/providers/bookdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
import logging
import subprocess
import tempfile
import threading
import requests
from pathlib import Path

Expand All @@ -33,6 +34,22 @@

logger = logging.getLogger(__name__)

# Issue #208: Skaldleita can signal "stop retrying this task" via a server_notice
# in the JSON response. We stash the notice in a thread-local so the caller
# (e.g. the watch-folder worker) can pick it up and mark the item as aborted
# without a 30-second retry loop. Thread-local keeps the signal scoped to the
# thread that issued the matching request.
_abort_state = threading.local()


def get_and_clear_server_abort():
"""Return (and clear) the last server_notice with action=abort_task seen
on this thread, or None. Safe to call when none was set."""
notice = getattr(_abort_state, 'notice', None)
if notice is not None:
_abort_state.notice = None
return notice

# Skaldleita API endpoint (our metadata service, legacy name: BookDB)
BOOKDB_API_URL = "https://bookdb.deucebucket.com" # URL unchanged for backwards compatibility
# Public API key for Library Manager users (no config needed)
Expand Down Expand Up @@ -168,6 +185,21 @@ def search_bookdb(title, author=None, api_key=None, retry_count=0, bookdb_url=No

data = resp.json()

# Issue #208: honor Skaldleita server_notice. Log every notice; on
# action=abort_task, stash in thread-local so the watch-folder worker
# can stop retrying instead of hammering /match every 30s.
notice = data.get('server_notice')
if notice:
code = notice.get('code', 'unknown')
msg = notice.get('message', '')
upgrade_url = notice.get('upgrade_url')
severity = notice.get('severity', 'info')
logger.warning(f"[SKALDLEITA] server notice ({severity}) [{code}]: {msg}")
if upgrade_url:
logger.warning(f"[SKALDLEITA] upgrade: {upgrade_url}")
if notice.get('action') == 'abort_task':
_abort_state.notice = notice

# Check confidence threshold
if data.get('confidence', 0) < 0.5:
logger.debug(f"Skaldleita match below confidence threshold: {data.get('confidence')}")
Expand Down
Loading