Skip to content

Refactor app bootstrap and sync flow; repair Alembic head history#56

Open
serabi wants to merge 19 commits intomainfrom
dev
Open

Refactor app bootstrap and sync flow; repair Alembic head history#56
serabi wants to merge 19 commits intomainfrom
dev

Conversation

@serabi
Copy link
Copy Markdown
Owner

@serabi serabi commented Apr 5, 2026

Summary

This PR groups the recent cleanup/refactor work into a single reviewable branch after stabilizing startup again.

Main changes:

  • split web server bootstrap into smaller runtime/setup/template modules
  • reduce SyncManager by extracting startup and cache-cleanup concerns
  • extract KoSync progress GET/PUT handling into a dedicated service
  • standardize shared JSON error responses and remove module-level route state from Hardcover routes
  • add shared repository helpers and improve database-service facade ergonomics
  • extract a reusable JSON retry helper and apply it to the Hardcover client
  • align dependency declarations and add project-level Pyright config
  • restore the missing Alembic merge revision so upgrades have a single head again

Commit breakdown

  • b9081fb — chore: align Python dependencies and Pyright config
  • 479ace6 — fix: restore Alembic merge revision to resolve multiple heads
  • 566f583 — refactor: split web server bootstrap into setup, runtime, and template modules
  • f55eafb — refactor: extract sync manager startup and cache cleanup services
  • b4480f4 — refactor: extract KoSync progress GET/PUT flow into dedicated service
  • 17ba760 — refactor: standardize route error responses and remove module-level route state
  • a85ba72 — refactor: add shared repository helpers and improve database facade ergonomics
  • f2add3b — refactor: extract shared JSON retry helper and reuse it in Hardcover client

Validation

Validated in the dev container after fixing the dependency pin and Alembic head history:

  • py_compile passes for the touched modules
  • pyright src tests/test_app_runtime.py reports clean with the project config
  • docker compose -f docker-compose.dev.yml up --build -d succeeds
  • dev container starts healthy
  • alembic heads now shows a single head
  • alembic upgrade head succeeds
  • smoke-checked routes:
    • /healthcheck
    • /
    • /settings
    • /reading
    • /logs
    • /api/status

Notes

  • The dev container uses mounted local source, so runtime validation was against the current branch contents.
  • I have not done the deeper manual product-behavior pass yet; this PR is intended to capture the cleanup and startup/upgrade stabilization first.

Summary by CodeRabbit

  • New Features

    • Upload button for books with EPUBs, per-book Refresh and Import actions, and a Re‑sync All Highlights maintenance button.
    • Journal entries now render Markdown and return rendered HTML.
  • Updates

    • BookFusion UI removed as a standalone page; actions moved into the reading detail view.
    • client UI/behavior simplified for highlight syncing and upload flows; import now reports saved/skipped.
  • Bug Fixes

    • Journal deduplication to avoid duplicate entries.
  • Database

    • Removed obsolete BookFusion linkage field; timestamps made timezone-aware.
  • Tests

    • Added coverage for BookFusion routes, markdown rendering, progress APIs, and cache cleanup.

Note

Refactor app bootstrap into dedicated modules and add BookFusion per-book sync endpoint

  • Splits inline startup logic from web_server.py into app_setup.py (dependency/service init), app_runtime.py (runtime reconfiguration, socket listener reconciliation, signal handling), and app_template_context.py (Jinja globals/context processor).
  • Adds a POST /api/bookfusion/sync-book endpoint and removes several legacy BookFusion endpoints (/library, /highlights, /add-to-dashboard, /grimmory-books), narrowing the API surface significantly.
  • Journal entries now render as sanitized markdown HTML (via mistune + nh3) in both the reading detail template and API responses for add/update journal.
  • Removes matched_abs_id from BookfusionHighlight and BookfusionBook models and drops the columns via an Alembic migration; a merge migration repairs a split head in the history.
  • All ORM model timestamps switch from naive datetime.utcnow() to timezone-aware datetime.now(UTC).
  • Standardizes JSON error responses across blueprints using new json_error/json_success helpers in src/utils/http.py.
  • Risk: several BookFusion API endpoints are permanently removed; clients relying on /api/bookfusion/library, /api/bookfusion/highlights, /api/bookfusion/link-highlight, or /api/bookfusion/add-to-dashboard will receive 404s after deployment.
📊 Macroscope summarized f2add3b. 24 files reviewed, 13 issues evaluated, 7 issues filtered, 4 comments posted

🗂️ Filtered Issues

src/api/hardcover_client.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 98: The post_json_with_retries method in JsonHttpClientBase uses logger.warning(...) but the reference code for src/api/http_client_base.py shows only import time and import requests — no import logging and no logger = logging.getLogger(...). When HardcoverClient.query() receives a 429 response and attempts a retry, the call to logger.warning(...) in the base class will raise NameError: name 'logger' is not defined, crashing the retry path. [ Out of scope (triage) ]
src/app_runtime.py — 4 comments posted, 7 evaluated, 1 filtered
  • line 50: reconcile_socket_listener accesses app.config["database_service"] and app.config["sync_manager"] directly (lines 50-51 and 73-74), but initialize_abs_listener — the function that runs at startup — receives these as function parameters and never stores them in app.config. When reconcile_socket_listener is called during hot-reload via apply_settings, it will raise a KeyError because these keys were never set. [ Out of scope (triage) ]
src/blueprints/reading_bp.py — 0 comments posted, 1 evaluated, 1 filtered
  • line 691: Line 691 still uses jsonify({"success": False, "error": ...}), 400 while all other error responses in this function and in update_dates were converted to use json_error(). This appears to be an oversight where one error response was missed during the refactoring, resulting in inconsistent error response formatting. [ Failed validation ]
src/services/kosync_progress_service.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 191: Line 191 uses datetime.min as a fallback for sorting, but if any s.last_updated values are timezone-aware (which they will be, since utc_now() returns datetime.now(UTC)), comparing them with datetime.min (which is naive) will raise TypeError: can't compare offset-naive and offset-aware datetimes. Should use datetime.min.replace(tzinfo=UTC) instead. [ Failed validation ]
  • line 198: Line 198 calls int(latest_state.last_updated) but last_updated is a datetime object. Python cannot convert a datetime directly to int - this will raise TypeError: int() argument must be a string, a bytes-like object or a number, not 'datetime.datetime'. Should be int(latest_state.last_updated.timestamp()) to get the Unix timestamp. [ Failed validation ]
src/services/kosync_service.py — 0 comments posted, 2 evaluated, 2 filtered
  • line 549: Delegation to KosyncProgressService.handle_put_progress() will crash with NameError: name 'logger' is not defined. The kosync_progress_service.py file uses logger throughout (e.g., logger.warning(), logger.info()) but never imports logging or defines logger = logging.getLogger(__name__). [ Out of scope ]
  • line 552: Delegation to KosyncProgressService.handle_get_progress() will crash with NameError: name 'logger' is not defined. The kosync_progress_service.py file uses logger throughout but never imports logging or defines logger. [ Out of scope ]

serabi added 8 commits April 5, 2026 14:21
- Remove dedicated BookFusion page and navbar link
- Refactor upload to work from PageKeeper books (abs_id) instead of Grimmory
- Add per-book sync endpoint for refresh from reading page
- Add Re-sync All button in Settings
- Add deduplication to journal import to prevent duplicates on re-import
- Remove auto-match and manual matching flows (broken)
- Remove matched_abs_id columns (never populated)
- Add upload/refresh buttons to reading detail page service row
- Add Refresh and Import buttons to highlights tab
- Update Settings copy to reflect new model
- Tests for book-based upload (abs_id instead of book_id)
- Tests for per-book sync endpoint
- Tests for journal import with deduplication
- Tests for full resync functionality
- Remove tests for removed routes (grimmory-books, library, link-highlight, etc.)
- Handle book-ID, book-ID, and raw numeric ID formats
- Add already_linked response for upload feedback
Normalize BookFusion highlight dedupe against semantic quote and chapter
content so legacy imports do not duplicate newer markdown-formatted
entries.

Unify markdown rendering across templates and JSON responses so the
reading timeline and client-side note updates use the same safe HTML
path, and update tests plus UTC timestamp handling to remove current
deprecation warnings.
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 5, 2026

📝 Walkthrough

Walkthrough

Removes BookFusion UI and the matched_abs_id DB column, rewires BookFusion flows to PageKeeper-driven upload/sync by abs_id, adds markdown sanitization and rendered entry_html, centralizes KoSync progress handling, standardizes timezone-aware timestamps, and introduces multiple new services, migrations, and tests.

Changes

Cohort / File(s) Summary
Migrations
alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py, alembic/versions/t9u0v1w2x3y4_merge_detected_books_head.py
Drop matched_abs_id from BookFusion tables; add merge head stub.
Models & Repos (DB)
src/db/models.py, src/db/bookfusion_repository.py, src/db/reading_repository.py, src/db/base_repository.py, src/db/database_service.py, src/db/hardcover_repository.py
Remove matched_abs_id fields; add utc_now() for timezone-aware defaults; new repo helpers (get_bookfusion_books_by_book_id, save_bookfusion_book, _expunge_items, _paginate, delegation/dir fixes); adjust unlink/link behaviors and pagination/expunge usage.
Blueprints / API
src/blueprints/bookfusion_bp.py, src/blueprints/reading_bp.py, src/blueprints/api.py, src/blueprints/tbr_bp.py
Major BookFusion blueprint refactor: new /api/bookfusion/upload, /api/bookfusion/sync-book, simplified /api/bookfusion/sync-highlights and deduping /api/bookfusion/save-journal; numerous removed legacy routes and standardized json_error usage; reading blueprint adds bookfusion_upload_eligible and returns entry_html on journal endpoints.
Services & Runtime
src/services/kosync_service.py, src/services/kosync_progress_service.py, src/services/alignment_service.py, src/services/storyteller_submission_service.py, src/services/cache_cleanup_service.py, src/services/sync_manager_startup.py, src/sync_manager.py, src/app_runtime.py, src/app_setup.py
Extract KoSync progress into KosyncProgressService; delegate Kosync progress calls; add CacheCleanupService and SyncManagerStartup; make timestamp handling timezone-aware; add runtime orchestration (app_runtime) and centralized dependency setup (app_setup).
HTTP Utilities & Clients
src/utils/http.py, src/api/http_client_base.py, src/api/hardcover_client.py, src/api/hardcover_routes.py
Add json_error/json_success helpers; introduce JsonHttpClientBase.post_json_with_retries; make HardcoverClient subclass base and use retry helper; remove module-level init in hardcover routes in favor of current_app.config dependency resolution.
Markdown Utilities & Templates
src/utils/markdown.py, src/web_server.py, src/app_template_context.py, templates/reading_detail.html
New markdown rendering and HTML sanitization utilities (render_markdown_html, render_markdown_markup, sanitize_html); wire Jinja filters; reading detail now renders entry_html and adjusts BookFusion UI actions and cover fallback logic.
UI Removal & Frontend
templates/bookfusion.html, templates/partials/navbar.html, static/js/bookfusion.js, static/css/bookfusion.css, static/js/reading.js, static/css/reading.css, static/js/settings.js, templates/settings.html
Removed BookFusion page template, JS, and CSS; navbar link removed; reading template and JS updated for markdown journal rendering and new upload/sync buttons; settings adds “Re-sync All Highlights” button and client handler.
Tests
tests/conftest.py, tests/test_bookfusion_repository.py, tests/test_bookfusion_routes.py, tests/test_bookfusion_save_journal_regressions.py, tests/test_markdown_utils.py, tests/test_reading_journal_markdown.py, tests/test_app_runtime.py, tests/test_cache_cleanup_service.py, tests/test_kosync_progress_service.py
Expanded/refactored tests to cover new BookFusion upload/sync/save-journal flows, markdown utilities, Kosync progress behaviors, cache cleanup, and runtime helpers; added nh3/mistune stubs in test fixtures.
Packaging & Tooling
requirements.txt, pyproject.toml, pyrightconfig.json
Add mistune and pytest-asyncio to requirements; add project dependencies to pyproject; add Pyright config.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Browser
    participant Server as Flask
    participant DB
    participant BF as BookFusionAPI

    Client->>Browser: Click Upload (abs_id)
    Browser->>Server: POST /api/bookfusion/upload {abs_id}
    Server->>DB: fetch book by abs_id
    DB-->>Server: book with ebook_filename
    Server->>Server: get_local_epub -> epub bytes
    Server->>BF: upload EPUB bytes
    BF-->>Server: bookfusion_id
    Server->>DB: save BookfusionBook (matched_book_id)
    DB-->>Server: commit
    Server-->>Browser: {success: true, already_linked: false}
    Browser->>Client: show success
Loading
sequenceDiagram
    participant Client
    participant Browser
    participant Server as Flask
    participant DB

    Client->>Browser: Click Import to Journal (abs_id)
    Browser->>Server: POST /api/bookfusion/save-journal {abs_id}
    Server->>DB: load Bookfusion highlights for book
    DB-->>Server: highlights list
    Server->>Server: normalize quote + chapter, compute dedupe key
    Server->>DB: get_reading_journal_entries_for_book(book_id, event="highlight")
    DB-->>Server: existing journals
    Server->>Server: skip duplicates, prepare new entries (markdown formatting)
    Server->>DB: add new ReadingJournal rows
    DB-->>Server: commit
    Server-->>Browser: {success: true, saved: N, skipped: M}
    Browser->>Client: update UI counts
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Poem

📚 Old page folds, a new upload sings,
Timestamps don their timezone wings.
Markdown cleans and journals keep,
Highlights sync and dedupe sleep.
Out with the old, in with tidy things.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 30.35% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main changes: refactoring app bootstrap/sync flow and fixing Alembic migration history.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch dev

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Comment thread src/services/kosync_service.py
Comment thread tests/test_markdown_utils.py Outdated
Comment thread templates/reading_detail.html
assert resp.status_code == 200
data = resp.get_json()
assert data["journal"]["entry"] == journal.entry
assert data["journal"]["entry_html"] == f"<p>{journal.entry}</p>"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low tests/test_reading_journal_markdown.py:40

The test asserts entry_html equals <p>A **bold** note</p>, which contains literal Markdown syntax. If the feature renders Markdown to HTML, the expected value should be <p>A <strong>bold</strong> note</p>. The current assertion passes only if the implementation incorrectly returns unrendered Markdown, or fails against a correct implementation.

-    assert data["journal"]["entry_html"] == f"<p>{journal.entry}</p>"
+    assert data["journal"]["entry_html"] == "<p>A <strong>bold</strong> note</p>"
Also found in 1 other location(s)

tests/test_markdown_utils.py:7

The assertion == &#34;&lt;p&gt;Hello world&lt;/p&gt;&#34; will likely fail because mistune.html() adds a trailing newline after block elements. The actual return value of render_markdown_html(&#34;Hello world&#34;) is probably &#34;&lt;p&gt;Hello world&lt;/p&gt;\n&#34;, causing this test to fail.

🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file tests/test_reading_journal_markdown.py around line 40:

The test asserts `entry_html` equals `<p>A **bold** note</p>`, which contains literal Markdown syntax. If the feature renders Markdown to HTML, the expected value should be `<p>A <strong>bold</strong> note</p>`. The current assertion passes only if the implementation incorrectly returns unrendered Markdown, or fails against a correct implementation.

Evidence trail:
tests/test_reading_journal_markdown.py line 25: `journal = _make_journal(entry="A **bold** note")`
tests/test_reading_journal_markdown.py line 40: `assert data["journal"]["entry_html"] == f"<p>{journal.entry}</p>"`
src/blueprints/reading_bp.py lines 633, 700, 720: `entry_html` is populated via `render_markdown_html(journal.entry)`
src/utils/markdown.py lines 33-37: `render_markdown_html` uses `mistune.html(value)` which is a Markdown parser that converts `**bold**` to `<strong>bold</strong>`

Also found in 1 other location(s):
- tests/test_markdown_utils.py:7 -- The assertion `== "<p>Hello world</p>"` will likely fail because `mistune.html()` adds a trailing newline after block elements. The actual return value of `render_markdown_html("Hello world")` is probably `"<p>Hello world</p>\n"`, causing this test to fail.

Comment thread static/js/settings.js
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/db/models.py (1)

768-768: ⚠️ Potential issue | 🟡 Minor

GrimmoryBook.last_updated still uses deprecated datetime.utcnow.

All other models now use utc_now, but this column was missed. This creates inconsistent timestamp behavior—naive vs. timezone-aware.

🔧 Fix for consistency
-    last_updated = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow)
+    last_updated = Column(DateTime, default=utc_now, onupdate=utc_now)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/models.py` at line 768, Update the GrimmoryBook model's last_updated
Column to use the existing utc_now callable instead of datetime.utcnow for
consistent timezone-aware timestamps; locate the last_updated definition on the
GrimmoryBook class (symbol: GrimmoryBook.last_updated) and change its default
and onupdate to utc_now so it matches other models that use utc_now.
🧹 Nitpick comments (1)
src/blueprints/bookfusion_bp.py (1)

80-93: Consider using lazy logging format instead of f-strings.

F-strings in logging calls (e.g., logger.error(f"...")) evaluate even when the log level is disabled. Minor performance impact, but standard practice is logger.error("message: %s", e).

♻️ Optional: Use lazy logging format
     except Exception as e:
-        logger.error(f"Failed to read ebook file: {e}")
+        logger.error("Failed to read ebook file: %s", e)
         return jsonify({"error": "Failed to read ebook file"}), 500

     title = book.title or ""
     authors = book.author or ""

-    logger.info(f"BookFusion upload request: title='{title}', authors='{authors}', filename='{ebook_filename}'")
+    logger.info("BookFusion upload request: title='%s', authors='%s', filename='%s'", title, authors, ebook_filename)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blueprints/bookfusion_bp.py` around lines 80 - 93, Replace f-strings in
logging calls with lazy logging format: in the except block for reading the
ebook file, change logger.error(f"Failed to read ebook file: {e}") to use
parameterized logging with the exception object (e.g., logger.error("Failed to
read ebook file: %s", e)); likewise change the logger.info call that logs
title/authors/filename to logger.info("BookFusion upload request: title=%s,
authors=%s, filename=%s", title, authors, ebook_filename). Keep the existing
variables (file_bytes, title, authors, ebook_filename) and the
bf_client.upload_book call unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py`:
- Around line 8-9: Imports are unsorted; swap and sort the two top-level imports
so third-party modules are alphabetized (place "from alembic import op" before
"import sqlalchemy as sa"), ensure there's a single blank line after the import
block if required by project linting, then run the project's import linter/isort
to confirm the ordering; verify upgrade() and downgrade() remain unchanged.

In `@static/js/settings.js`:
- Around line 531-563: The click handler for bfResyncBtn leaves the "error" CSS
class set after a failed attempt, so before starting a new sync (inside the
bfResyncBtn.addEventListener callback, before setting textContent to
'Syncing...') remove any stale classes (e.g.,
bfResyncBtn.classList.remove('error','done')) and ensure on successful response
you also clear the 'error' class (and on failure clear or set classes
appropriately); update the success and catch branches to remove/add the correct
classes so retries show correct styling.

In `@templates/reading_detail.html`:
- Around line 912-943: There is a duplicate click handler binding for the
element id "bulk-import-highlights" that causes double API calls to
'/api/bookfusion/save-journal' and uses an unused local variable bookId; remove
the earlier IIFE that binds to '#bulk-import-highlights' so only the single
(later) handler remains, and update that remaining handler to use the global
_rdAbsId (instead of bookId) as the POST body, and ensure importBtn state
handling (disabled, textContent, and error/done classes) is correct and
symmetric on success/failure.

In `@tests/test_markdown_utils.py`:
- Around line 14-16: The test
test_reading_detail_uses_block_container_for_markdown currently reads a
machine-specific absolute path; replace that with a portable relative/resource
lookup so CI and other dev machines can find the template. Update the Path(...)
call in test_reading_detail_uses_block_container_for_markdown to compute the
template location relative to the test file (e.g., Path(__file__).parent /
"templates" / "reading_detail.html") or load it via package/resource utilities,
then read_text() and assert as before so the test no longer relies on an
absolute filesystem path.

---

Outside diff comments:
In `@src/db/models.py`:
- Line 768: Update the GrimmoryBook model's last_updated Column to use the
existing utc_now callable instead of datetime.utcnow for consistent
timezone-aware timestamps; locate the last_updated definition on the
GrimmoryBook class (symbol: GrimmoryBook.last_updated) and change its default
and onupdate to utc_now so it matches other models that use utc_now.

---

Nitpick comments:
In `@src/blueprints/bookfusion_bp.py`:
- Around line 80-93: Replace f-strings in logging calls with lazy logging
format: in the except block for reading the ebook file, change
logger.error(f"Failed to read ebook file: {e}") to use parameterized logging
with the exception object (e.g., logger.error("Failed to read ebook file: %s",
e)); likewise change the logger.info call that logs title/authors/filename to
logger.info("BookFusion upload request: title=%s, authors=%s, filename=%s",
title, authors, ebook_filename). Keep the existing variables (file_bytes, title,
authors, ebook_filename) and the bf_client.upload_book call unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 098c6b0d-c74e-4149-a3ca-3c248f7f7ff6

📥 Commits

Reviewing files that changed from the base of the PR and between 34192d9 and 63ac1ba.

⛔ Files ignored due to path filters (1)
  • .opencode/plans/bookfusion-client-tests.md is excluded by !**/*.md
📒 Files selected for processing (27)
  • alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py
  • requirements.txt
  • src/blueprints/bookfusion_bp.py
  • src/blueprints/reading_bp.py
  • src/db/bookfusion_repository.py
  • src/db/models.py
  • src/db/reading_repository.py
  • src/services/alignment_service.py
  • src/services/kosync_service.py
  • src/services/storyteller_submission_service.py
  • src/utils/markdown.py
  • src/web_server.py
  • static/css/bookfusion.css
  • static/css/reading.css
  • static/js/bookfusion.js
  • static/js/reading.js
  • static/js/settings.js
  • templates/bookfusion.html
  • templates/partials/navbar.html
  • templates/reading_detail.html
  • templates/settings.html
  • tests/conftest.py
  • tests/test_bookfusion_repository.py
  • tests/test_bookfusion_routes.py
  • tests/test_bookfusion_save_journal_regressions.py
  • tests/test_markdown_utils.py
  • tests/test_reading_journal_markdown.py
💤 Files with no reviewable changes (4)
  • templates/partials/navbar.html
  • templates/bookfusion.html
  • static/css/bookfusion.css
  • static/js/bookfusion.js

Comment on lines +8 to +9
import sqlalchemy as sa
from alembic import op
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix import ordering to pass linting.

The imports are unsorted, causing the pipeline to fail.

Proposed fix
-import sqlalchemy as sa
 from alembic import op
+import sqlalchemy as sa

As per coding guidelines for alembic/versions/**: Migration files should include both upgrade() and downgrade() functions. ✓ Both are present.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import sqlalchemy as sa
from alembic import op
from alembic import op
import sqlalchemy as sa
🧰 Tools
🪛 GitHub Actions: Lint

[error] 8-8: Ruff check failed (I001): Import block is un-sorted or un-formatted.

🪛 GitHub Check: ruff

[failure] 8-9: ruff (I001)
alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py:8:1: I001 Import block is un-sorted or un-formatted
help: Organize imports

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py` around
lines 8 - 9, Imports are unsorted; swap and sort the two top-level imports so
third-party modules are alphabetized (place "from alembic import op" before
"import sqlalchemy as sa"), ensure there's a single blank line after the import
block if required by project linting, then run the project's import linter/isort
to confirm the ordering; verify upgrade() and downgrade() remain unchanged.

Comment thread static/js/settings.js
Comment thread templates/reading_detail.html Outdated
Comment thread tests/test_markdown_utils.py
serabi added 2 commits April 5, 2026 21:47
Remove the malformed Z suffix from timezone-aware KoSync timestamps and apply the Ruff-required import formatting in the related Alembic migration.
Comment thread static/js/settings.js
Comment on lines +549 to +566
if (data.success) {
bfResyncBtn.classList.remove('error');
bfResyncBtn.textContent = 'Synced (' + data.new_highlights + ' new)';
bfResyncBtn.classList.add('done');
} else {
bfResyncBtn.classList.remove('done');
bfResyncBtn.textContent = data.error || 'Failed';
bfResyncBtn.classList.add('error');
bfResyncBtn.disabled = false;
}
})
.catch(function() {
bfResyncBtn.classList.remove('done');
bfResyncBtn.textContent = 'Error';
bfResyncBtn.classList.add('error');
bfResyncBtn.disabled = false;
});
});
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low js/settings.js:549

After a successful BookFusion sync, the button stays permanently disabled because bfResyncBtn.disabled = false is only set in the error/catch branches (lines 557, 564), not in the success branch. Users must refresh the page to re-sync.

                } else {
                    bfResyncBtn.classList.remove('done');
                    bfResyncBtn.textContent = data.error || 'Failed';
                    bfResyncBtn.classList.add('error');
-                    bfResyncBtn.disabled = false;
                }
            })
            .catch(function() {
                bfResyncBtn.classList.remove('done');
                bfResyncBtn.textContent = 'Error';
                bfResyncBtn.classList.add('error');
-                bfResyncBtn.disabled = false;
            });
+            .finally(function() {
+                bfResyncBtn.disabled = false;
+            });
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file static/js/settings.js around lines 549-566:

After a successful BookFusion sync, the button stays permanently disabled because `bfResyncBtn.disabled = false` is only set in the error/catch branches (lines 557, 564), not in the success branch. Users must refresh the page to re-sync.

Evidence trail:
static/js/settings.js lines 540-569 at REVIEWED_COMMIT:
- Line 540: `bfResyncBtn.disabled = true;` (disables button at start)
- Lines 549-553: success branch - no `disabled = false` statement
- Line 557: error branch has `bfResyncBtn.disabled = false;`
- Line 564: catch branch has `bfResyncBtn.disabled = false;`

@serabi serabi changed the title Merge dev into main Refactor app bootstrap and sync flow; repair Alembic head history Apr 25, 2026
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

🧹 Nitpick comments (14)
src/utils/runtime_config.py (1)

6-26: Add type annotations to match the documented signatures.

The AI summary advertises typed signatures (e.g., get_str(key, default="") -> str), but the implementation has none. Adding them improves IDE/type-checker support (Pyright is in your toolchain per the PR).

♻️ Proposed annotations
-def get_str(key, default=""):
+def get_str(key: str, default: str = "") -> str:
     return os.environ.get(key, default)


-def get_bool(key, default=False):
+def get_bool(key: str, default: bool = False) -> bool:
     raw_default = "true" if default else "false"
     return os.environ.get(key, raw_default).strip().lower() in ("true", "1", "yes", "on")


-def get_int(key, default):
+def get_int(key: str, default: int) -> int:
     try:
         return int(os.environ.get(key, str(default)))
     except (TypeError, ValueError):
         return int(default)


-def get_float(key, default):
+def get_float(key: str, default: float) -> float:
     try:
         return float(os.environ.get(key, str(default)))
     except (TypeError, ValueError):
         return float(default)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/runtime_config.py` around lines 6 - 26, Annotate the runtime config
helpers in src/utils/runtime_config.py: give each function a typed signature
(key: str) and explicit parameter and return types — get_str(key: str, default:
str = "") -> str; get_bool(key: str, default: bool = False) -> bool;
get_int(key: str, default: int) -> int; get_float(key: str, default: float) ->
float — so IDEs and type-checkers (Pyright) understand the expected types;
adjust any imports if needed for typing but no behavior changes to the bodies of
get_str, get_bool, get_int, or get_float.
src/api/hardcover_client.py (1)

92-112: Refactor LGTM — one tiny inconsistency.

Delegating retry/backoff to the base class is a clean improvement, and the _mark_retry callback correctly updates _last_request_time after the backoff sleep. Minor nit: line 111 hardcodes "after 3 retries" while max_retries=3 is also inline at line 103 — if either ever changes, they'll drift. Use the same constant:

-        max_retries=3,
+        max_retries=(_max_retries := 3),
...
-                logger.error("Hardcover rate limit persisted after 3 retries, giving up")
+                logger.error("Hardcover rate limit persisted after %d retries, giving up", _max_retries)

(Or just lift MAX_RETRIES = 3 to a class constant.)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/hardcover_client.py` around lines 92 - 112, The log message hardcodes
"after 3 retries" which can drift from the max_retries passed to
post_json_with_retries; change the code to refer to a single source of truth
(e.g. introduce a class constant MAX_RETRIES = 3 or reuse the local max_retries
variable) and use that identifier in the retry call and in the logger message
(refer to the post_json_with_retries call, the max_retries argument, the
_mark_retry callback, and the r.status_code == 429 logger.error) so the message
and the retry configuration always stay in sync.
tests/test_cache_cleanup_service.py (1)

7-28: Consider adding edge-case coverage.

Happy path is covered well. A few low-cost additions worth tagging on:

  • cleanup() is a no-op when epub_cache_dir doesn't exist (early return at line 14).
  • suggestion.matches is None / non-list (which today would silently abort the whole pass — see comment in cache_cleanup_service.py).
  • A file whose unlink() raises — verifies the per-file except keeps the loop running.

Not blocking, but these are the paths most likely to regress.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_cache_cleanup_service.py` around lines 7 - 28, Add unit tests to
cover the noted edge cases for CacheCleanupService.cleanup(): test that
cleanup() returns early (no-op) when epub_cache_dir does not exist; test that
suggestions with suggestion.matches set to None or a non-list do not raise and
are skipped (locate usage around suggestion.matches in
CacheCleanupService.cleanup); and test that when Path.unlink() raises for a
specific cached file the exception is caught and the loop continues (simulate by
mocking Path.unlink or creating a file and patching unlink to raise while
asserting other files are still removed). Ensure tests instantiate
CacheCleanupService with a Mock db (get_all_books /
get_all_actionable_suggestions) and verify expected files remain or are removed
after service.cleanup().
pyrightconfig.json (1)

7-16: Heads up: this config silences a lot of pyright's actual value.

Disabling reportMissingImports, reportCallIssue, reportArgumentType, reportAssignmentType, and reportOptionalMemberAccess simultaneously means pyright will essentially only flag syntax/parse errors. If the goal is to silence noise from optional deps (e.g., alembic), consider narrowing via per-file # pyright: ignore or defineConstant/stub paths rather than globally muting whole categories. At minimum, reportCallIssue and reportAssignmentType catch real bugs and are worth keeping at "warning".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@pyrightconfig.json` around lines 7 - 16, The pyright configuration currently
mutes many important diagnostics (reportMissingImports,
reportMissingModuleSource, reportAttributeAccessIssue, reportArgumentType,
reportOptionalMemberAccess, reportOptionalOperand, reportOptionalSubscript,
reportAssignmentType, reportCallIssue, reportIncompatibleMethodOverride), which
hides real bugs; change at least reportCallIssue and reportAssignmentType back
to "warning" (instead of "none") in pyrightconfig.json, and avoid globally
disabling reportMissingImports/reportArgumentType — instead add targeted
per-file directives (# pyright: ignore) or use defineConstant/stub paths for
optional deps like alembic to reduce noise while preserving useful checks.
Ensure the unique keys reportCallIssue and reportAssignmentType are updated and
consider selectively re-enabling reportMissingImports or handling imports via
type stubs rather than keeping them set to "none".
src/api/http_client_base.py (2)

1-1: Pyright suppression at file scope is broad.

# pyright: reportMissingImports=false, reportMissingModuleSource=false here is redundant given pyrightconfig.json already disables both globally. Either drop these per-file pragmas (DRY) or, conversely, drop them from the global config and keep them local where actually needed. Mixing both makes future tightening harder.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/http_client_base.py` at line 1, The file-scope Pyright pragma "#
pyright: reportMissingImports=false, reportMissingModuleSource=false" is
redundant with the global pyrightconfig.json; remove this per-file pragma to
avoid DRY/redundant suppressions (or alternatively remove those global disables
and keep the per-file pragma if the intent is local-only), so update the file by
deleting the "# pyright: reportMissingImports=false,
reportMissingModuleSource=false" line (or coordinate with the team to remove the
same rules from pyrightconfig.json if you prefer local pragmas).

30-50: Network exceptions bypass the retry loop.

request_func(...) on lines 30 and 47 isn't wrapped, so requests.exceptions.ConnectionError, Timeout, etc. propagate immediately — no retry, no backoff. For the current Hardcover-only caller (which only retries 429), that's arguably intentional. But if this base class is going to be reused (the name suggests it will), transient network failures are usually the first thing you'd want to retry. Consider treating exceptions raised by request_func as a retryable signal too, e.g.:

🔧 Sketch
-        retry_statuses = set(retry_statuses or [])
-        request_func = request_func or requests.post
-        response = request_func(url, json=json_body, headers=headers, timeout=timeout)
-        attempt = 0
-        backoff = backoff_seconds
-
-        while response.status_code in retry_statuses and attempt < max_retries:
+        retry_statuses = set(retry_statuses or [])
+        retry_exceptions = (requests.exceptions.ConnectionError, requests.exceptions.Timeout)
+        request_func = request_func or requests.post
+        attempt = 0
+        backoff = backoff_seconds
+
+        def _do_request():
+            return request_func(url, json=json_body, headers=headers, timeout=timeout)
+
+        try:
+            response = _do_request()
+        except retry_exceptions:
+            if max_retries <= 0:
+                raise
+            response = None
+
+        while attempt < max_retries and (response is None or response.status_code in retry_statuses):
             ...

Also worth considering jitter on backoff *= 2 if multiple workers ever share this client.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/http_client_base.py` around lines 30 - 50, Wrap calls to request_func
in a try/except that catches requests.exceptions.RequestException (or the
broader RequestException family) and treat exceptions the same as retryable
responses: increment attempt, call on_retry with the exception (e.g.,
on_retry(attempt, exception)), sleep using backoff (adding optional jitter),
multiply backoff (backoff *= 2) and continue retrying until max_retries; apply
this handling for both the initial request and subsequent retries in the loop
where retry_statuses, backoff_seconds, max_retries, request_func, and on_retry
are used so transient network errors are retried consistently with status-code
retries.
src/db/base_repository.py (1)

40-47: Optional: rename first flag to avoid shadowing query API name.

first collides conceptually with query.first() in the body and forces a positional bool flag at call sites. A keyword-only param like one=False (matching the AI summary) reads cleaner; or split into _query_one_and_expunge / _query_all_and_expunge to drop the flag entirely. Non-blocking.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/base_repository.py` around lines 40 - 47, Rename the boolean flag on
_query_and_expunge to avoid shadowing query.first() and to make call sites
clearer: change the signature of _query_and_expunge (and its callers) to use a
keyword-only parameter like one=False (or split into two methods
_query_one_and_expunge and _query_all_and_expunge) so the intent isn’t confused
with query.first(); inside the function keep the same logic (use query.first()
and session.expunge(obj) for the single-case, and query.all() with
_expunge_items for the multi-case) but update all invocations to pass the new
keyword or call the new split methods.
src/app_setup.py (1)

5-5: Drop unused Path import (Ruff F401).

♻️ Suggested change
-from pathlib import Path
-
 from dependency_injector import providers
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_setup.py` at line 5, Remove the unused import Path from
src/app_setup.py by deleting the "Path" symbol from the import statement (the
line "from pathlib import Path"); ensure no other code references Path in
functions or classes in this module (if any references exist, either restore
usage or import the needed symbol where it's actually used).
templates/reading_detail.html (1)

549-551: Dead data-book-id attribute on bulk-import button.

The handler at line 1223-1255 reads from _rdAbsId, not dataset.bookId, so data-book-id="{{ book.id }}" is dead. Either drop the attribute or align the handler to use it. Cosmetic.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@templates/reading_detail.html` around lines 549 - 551, The bulk import button
with id "bulk-import-highlights" currently includes a dead data-book-id="{{
book.id }}" attribute because the click handler uses the _rdAbsId variable (not
dataset.bookId); either remove the unused attribute from the <button> or change
the handler to read event.currentTarget.dataset.bookId and use that value
instead of _rdAbsId (update the handler referenced by its id
"bulk-import-highlights" and any functions that rely on _rdAbsId to accept the
bookId from dataset).
src/services/sync_manager_startup.py (1)

54-61: Optional: drop the try/except around hasattr.

hasattr doesn't raise (since Python 3.2 it swallows exceptions internally), so wrapping it in try/except is unreachable defensive code. Either drop the wrapper or call the methods to actually exercise them.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/sync_manager_startup.py` around lines 54 - 61, Remove the
unreachable try/except wrapper around the hasattr checks: inside the
is_configured() branch, directly test hasattr(self.abs_client,
"get_ebook_files") and hasattr(self.abs_client, "search_ebooks") and log via
logger.info/logger.warning accordingly; delete the surrounding try and the
except that calls sanitize_exception(exc) so the code is simpler and not
guarding against an impossible hasattr exception on self.abs_client.
src/blueprints/reading_bp.py (2)

691-691: Inconsistent error response style.

This branch still uses raw jsonify({"success": False, "error": ...}), 400 while the rest of the function (and file) was migrated to json_error(...). Tighten consistency:

♻️ Suggested change
-            return jsonify({"success": False, "error": "Invalid date format (expected YYYY-MM-DD)"}), 400
+            return json_error("Invalid date format (expected YYYY-MM-DD)", 400)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blueprints/reading_bp.py` at line 691, The branch currently returns
jsonify({"success": False, "error": "Invalid date format (expected
YYYY-MM-DD)"}), 400 which is inconsistent with the file's standard error helper;
replace that raw jsonify return with a call to the common json_error helper
(e.g. json_error("Invalid date format (expected YYYY-MM-DD)", 400)) so the
function uses json_error everywhere instead of jsonify; update the return site
where the current jsonify call appears and remove the old jsonify usage.

3-26: Sort imports (Ruff I001).

Static analysis flags the import block as unsorted; from src.utils.http import json_error was inserted out of order. Run ruff check --fix or move it next to the other src.utils.* imports.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blueprints/reading_bp.py` around lines 3 - 26, Import ordering in
src.blueprints.reading_bp.py is unsorted: move the "from src.utils.http import
json_error" import into the block of other src.utils imports so the src.utils.*
imports are contiguous and alphabetically ordered (e.g., ensure json_error
appears alongside resolve_book_covers and render_markdown_html imports or
grouped with other src.utils imports); run ruff check --fix or manually reorder
the import block to match Ruff I001 expectations while preserving existing
import names like json_error, resolve_book_covers, and render_markdown_html.
src/blueprints/bookfusion_bp.py (1)

178-198: linked_count is a constant — and sync_all_highlights ignores abs_id.

Two nits in /api/bookfusion/sync-book:

  1. The loop just increments linked_count per bf_id, so it's always len(bf_book_ids). Either count actual link-result rows (e.g., return value of link_bookfusion_highlights_by_book_id) or drop it.
  2. bf_client.sync_all_highlights(db_service) syncs the entire BookFusion library on every per-book request. For users with large libraries and the "Re-sync All" UI calling this for many books, this is O(N²) traffic. Consider a per-book sync (or pass bf_book_ids to scope the sync) if BookFusion's API supports it.
♻️ Suggested cleanup for `#1`
-        linked_count = 0
-        for bf_id in bf_book_ids:
-            db_service.link_bookfusion_highlights_by_book_id(bf_id, book.id)
-            linked_count += 1
+        for bf_id in bf_book_ids:
+            db_service.link_bookfusion_highlights_by_book_id(bf_id, book.id)
@@
-                "linked_books": linked_count,
+                "linked_books": len(bf_book_ids),
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/blueprints/bookfusion_bp.py` around lines 178 - 198, The loop increments
linked_count blindly and bf_client.sync_all_highlights(db_service) syncs the
whole library on a per-book endpoint; fix by (A) making
link_bookfusion_highlights_by_book_id return the number of rows linked and use
that return value to accumulate linked_count (replace the manual ++ with adding
the actual result from db_service.link_bookfusion_highlights_by_book_id), or
simply drop linked_count if you don't need it, and (B) avoid calling
bf_client.sync_all_highlights for a single-book request — either add/use a
scoped sync method (e.g., bf_client.sync_highlights_for_book(bookfusion_id,
db_service) or accept bf_book_ids in sync_all_highlights) and call that with
each bf_id (or call one scoped sync for the single book before linking) to
prevent O(N²) full-library syncs.
src/services/kosync_progress_service.py (1)

17-21: Reaching into service._db / _container / _manager couples this class to private internals.

If KosyncService ever renames or refactors those attributes, this service silently breaks at construction (or worse, at first call). Since KosyncProgressService already needs all three, prefer passing them explicitly from the composition root, or expose public accessors on KosyncService.

♻️ Sketch
-    def __init__(self, service):
-        self.service = service
-        self.db = service._db
-        self.container = service._container
-        self.manager = service._manager
+    def __init__(self, service, *, db, container, manager):
+        self.service = service
+        self.db = db
+        self.container = container
+        self.manager = manager
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/kosync_progress_service.py` around lines 17 - 21,
KosyncProgressService currently reaches into KosyncService private internals in
__init__ by accessing service._db, service._container, and service._manager;
update the constructor to avoid coupling by either (A) changing
KosyncProgressService.__init__ to accept the dependencies explicitly (db,
container, manager) and wire them from the composition root where KosyncService
is constructed, or (B) add and use public accessors on KosyncService (e.g.,
get_db(), get_container(), get_manager()) and replace uses of
service._db/_container/_manager with those methods; modify all call sites that
construct KosyncProgressService to pass the new arguments or to call the new
accessors accordingly so no private attributes are accessed directly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/api/hardcover_client.py`:
- Line 20: The file currently imports requests but no longer uses it (causing
F401); remove the unused import statement from src/api/hardcover_client.py
(delete the top-level "import requests") and ensure any HTTP logic uses the
existing post_json_with_retries function or other referenced helpers instead of
adding a new requests usage so no new unused imports are introduced.

In `@src/api/hardcover_routes.py`:
- Around line 218-228: The HardcoverDetails construction coerces None values
into 0 or "" for fields hardcover_pages, hardcover_audio_seconds,
hardcover_slug, and hardcover_edition_id which loses the distinction between
"unknown" and an actual zero/empty value; either (A) revert these conversions so
you pass None (e.g., hardcover_pages=None if input is None,
hardcover_audio_seconds=None, hardcover_slug=None, hardcover_edition_id=None) so
the DB NULL semantics are preserved, or (B) if the model must always store 0/""
for business reasons, add a clear comment and validate that the HardcoverDetails
model and DB columns are non-nullable and that all callers (including the legacy
form flow that builds HardcoverDetails elsewhere) expect this semantics; update
the HardcoverDetails creation sites (the block constructing HardcoverDetails and
the legacy form-data flow that sets these same fields) accordingly so behavior
is consistent.

In `@src/app_runtime.py`:
- Line 4: Remove the unused import "os" from the top of src/app_runtime.py to
resolve the ruff F401 warning; locate the import statement (import os) and
delete it so only actually used modules remain imported.
- Around line 327-329: The code casts sync_port to int unconditionally
(sync_port = get_str("KOSYNC_PORT", ""); if sync_port and int(sync_port) !=
4477: start_split_port_server(...)) which raises ValueError for non-numeric
values; update start_runtime_services (or the function containing sync_port) to
validate sync_port is a valid integer before casting (e.g., use try/except
around int(sync_port) or use a numeric check), log a clear error message (via
the module logger or processLogger) when KOSYNC_PORT is invalid, and skip
calling start_split_port_server instead of letting the exception crash startup.

In `@src/app_template_context.py`:
- Around line 76-83: The context processor silently swallows all exceptions and
performs a DB COUNT on every template render; change the except block to log the
exception at debug (use current_app.logger.debug with the caught exception)
instead of pass, and avoid per-request DB hits by caching the pending count
briefly (e.g., use your app cache or a simple TTL cache keyed like
"pending_suggestion_count" before calling
db_svc.get_pending_suggestion_count()); also confirm or add an index on the
pending_suggestions.status column so get_pending_suggestion_count() is
efficient.
- Around line 62-67: The anchor with href "/bookfusion?tab=library&q={{
mapping.title|urlencode }}" in the service card is a dead link and should be
removed or replaced; locate the HTML anchor element with that exact href (and
class "service-value cursor-pointer color-inherit") and either delete the entire
surrounding service card block that contains it, or update the href to a valid
route/action used by the app (or convert it to a non-navigating element) so it
no longer points to the removed /bookfusion page.

In `@src/blueprints/api.py`:
- Line 10: Remove the unused import json_detail_error from the top of the file
where imports from src.utils.http are declared; keep only json_error to satisfy
linters (remove json_detail_error from the import statement that currently reads
like "from src.utils.http import json_detail_error, json_error"). Ensure no
other references to json_detail_error exist in the file before committing.

In `@src/blueprints/bookfusion_bp.py`:
- Around line 18-20: The dedup-key normalizer _normalize_bookfusion_chapter and
the insert path use different hashing of leading hashes (one caps at 6, the
other uses chapter.lstrip("#")), causing asymmetric keys and duplicate inserts;
update the insert code (the block that currently calls chapter.lstrip("#")
around the _bookfusion_entry_key/insert path) to call
_normalize_bookfusion_chapter(chapter) instead (apply the same change to the
related occurrences around the insert at lines ~255–258) so both dedup key
generation and inserted text use the identical normalization routine.

In `@src/db/database_service.py`:
- Around line 88-90: The import block is unsorted: move "import alembic.command
as alembic_command" into the existing alembic group and alphabetically order the
three alembic imports (e.g., group "from alembic.config import Config", "from
alembic.runtime.migration import MigrationContext", and "import alembic.command
as alembic_command") so third‑party imports are contiguous and alphabetized;
update the import ordering in database_service.py accordingly.

In `@src/services/cache_cleanup_service.py`:
- Around line 25-28: The loop over suggestion.matches can throw if matches is
None or not a list; change the inner iteration in the code that calls
database_service.get_all_actionable_suggestions() so it first asserts/guards
that suggestion.matches is a list (e.g., if not isinstance(suggestion.matches,
list): skip/log and continue), then iterate only over that list to populate
valid_filenames, ensuring a single malformed row cannot abort the entire
collection pass; reference suggestion.matches, valid_filenames, and
database_service.get_all_actionable_suggestions() when making the change.

In `@src/services/kosync_progress_service.py`:
- Around line 60-74: The furthest-wins early-return returns timestamp as an
epoch int while the normal PUT response returns now.isoformat(), causing
inconsistent wire formats; in the branch guarded by (furthest_wins and
kosync_doc and kosync_doc.percentage and not force_update and not same_device)
replace the int(...) timestamp construction with the same ISO string used by the
normal path (use kosync_doc.timestamp.isoformat() if available else
now.isoformat()), except preserve the special-case behavior for booknexus (which
should still return epoch as in the logic around lines 120–121); update the
returned "timestamp" generation to mirror the normal PUT response so both paths
emit the same ISO-formatted string.
- Around line 187-199: The max(...) key mixes datetime and float causing
TypeError when State.last_updated is None; update the logic around kosync_state
and latest_state to either filter out states without last_updated before taking
max or change the key to use a numeric sentinel (e.g., last_updated or
float('-inf')) so comparisons are always between numbers; adjust the selection
in the block that computes latest_state (the variables states, kosync_state,
latest_state and the key using last_updated) and leave the later uses
(percentage, xpath/cfi and int(latest_state.last_updated)) unchanged.

In `@src/utils/runtime_config.py`:
- Around line 10-12: get_bool currently treats an environment variable set to an
empty or whitespace string as a real value and coerces it to False, which
differs from leaving the var unset; update get_bool to first read
os.environ.get(key) into a variable and treat None or value.strip() == "" as
"unset" (return the provided default) and only if non-empty proceed to normalize
and check membership in ("true","1","yes","on"); remove the reliance on
raw_default and ensure get_bool(key, default=True) returns True when the env var
is set but empty/whitespace.

---

Nitpick comments:
In `@pyrightconfig.json`:
- Around line 7-16: The pyright configuration currently mutes many important
diagnostics (reportMissingImports, reportMissingModuleSource,
reportAttributeAccessIssue, reportArgumentType, reportOptionalMemberAccess,
reportOptionalOperand, reportOptionalSubscript, reportAssignmentType,
reportCallIssue, reportIncompatibleMethodOverride), which hides real bugs;
change at least reportCallIssue and reportAssignmentType back to "warning"
(instead of "none") in pyrightconfig.json, and avoid globally disabling
reportMissingImports/reportArgumentType — instead add targeted per-file
directives (# pyright: ignore) or use defineConstant/stub paths for optional
deps like alembic to reduce noise while preserving useful checks. Ensure the
unique keys reportCallIssue and reportAssignmentType are updated and consider
selectively re-enabling reportMissingImports or handling imports via type stubs
rather than keeping them set to "none".

In `@src/api/hardcover_client.py`:
- Around line 92-112: The log message hardcodes "after 3 retries" which can
drift from the max_retries passed to post_json_with_retries; change the code to
refer to a single source of truth (e.g. introduce a class constant MAX_RETRIES =
3 or reuse the local max_retries variable) and use that identifier in the retry
call and in the logger message (refer to the post_json_with_retries call, the
max_retries argument, the _mark_retry callback, and the r.status_code == 429
logger.error) so the message and the retry configuration always stay in sync.

In `@src/api/http_client_base.py`:
- Line 1: The file-scope Pyright pragma "# pyright: reportMissingImports=false,
reportMissingModuleSource=false" is redundant with the global
pyrightconfig.json; remove this per-file pragma to avoid DRY/redundant
suppressions (or alternatively remove those global disables and keep the
per-file pragma if the intent is local-only), so update the file by deleting the
"# pyright: reportMissingImports=false, reportMissingModuleSource=false" line
(or coordinate with the team to remove the same rules from pyrightconfig.json if
you prefer local pragmas).
- Around line 30-50: Wrap calls to request_func in a try/except that catches
requests.exceptions.RequestException (or the broader RequestException family)
and treat exceptions the same as retryable responses: increment attempt, call
on_retry with the exception (e.g., on_retry(attempt, exception)), sleep using
backoff (adding optional jitter), multiply backoff (backoff *= 2) and continue
retrying until max_retries; apply this handling for both the initial request and
subsequent retries in the loop where retry_statuses, backoff_seconds,
max_retries, request_func, and on_retry are used so transient network errors are
retried consistently with status-code retries.

In `@src/app_setup.py`:
- Line 5: Remove the unused import Path from src/app_setup.py by deleting the
"Path" symbol from the import statement (the line "from pathlib import Path");
ensure no other code references Path in functions or classes in this module (if
any references exist, either restore usage or import the needed symbol where
it's actually used).

In `@src/blueprints/bookfusion_bp.py`:
- Around line 178-198: The loop increments linked_count blindly and
bf_client.sync_all_highlights(db_service) syncs the whole library on a per-book
endpoint; fix by (A) making link_bookfusion_highlights_by_book_id return the
number of rows linked and use that return value to accumulate linked_count
(replace the manual ++ with adding the actual result from
db_service.link_bookfusion_highlights_by_book_id), or simply drop linked_count
if you don't need it, and (B) avoid calling bf_client.sync_all_highlights for a
single-book request — either add/use a scoped sync method (e.g.,
bf_client.sync_highlights_for_book(bookfusion_id, db_service) or accept
bf_book_ids in sync_all_highlights) and call that with each bf_id (or call one
scoped sync for the single book before linking) to prevent O(N²) full-library
syncs.

In `@src/blueprints/reading_bp.py`:
- Line 691: The branch currently returns jsonify({"success": False, "error":
"Invalid date format (expected YYYY-MM-DD)"}), 400 which is inconsistent with
the file's standard error helper; replace that raw jsonify return with a call to
the common json_error helper (e.g. json_error("Invalid date format (expected
YYYY-MM-DD)", 400)) so the function uses json_error everywhere instead of
jsonify; update the return site where the current jsonify call appears and
remove the old jsonify usage.
- Around line 3-26: Import ordering in src.blueprints.reading_bp.py is unsorted:
move the "from src.utils.http import json_error" import into the block of other
src.utils imports so the src.utils.* imports are contiguous and alphabetically
ordered (e.g., ensure json_error appears alongside resolve_book_covers and
render_markdown_html imports or grouped with other src.utils imports); run ruff
check --fix or manually reorder the import block to match Ruff I001 expectations
while preserving existing import names like json_error, resolve_book_covers, and
render_markdown_html.

In `@src/db/base_repository.py`:
- Around line 40-47: Rename the boolean flag on _query_and_expunge to avoid
shadowing query.first() and to make call sites clearer: change the signature of
_query_and_expunge (and its callers) to use a keyword-only parameter like
one=False (or split into two methods _query_one_and_expunge and
_query_all_and_expunge) so the intent isn’t confused with query.first(); inside
the function keep the same logic (use query.first() and session.expunge(obj) for
the single-case, and query.all() with _expunge_items for the multi-case) but
update all invocations to pass the new keyword or call the new split methods.

In `@src/services/kosync_progress_service.py`:
- Around line 17-21: KosyncProgressService currently reaches into KosyncService
private internals in __init__ by accessing service._db, service._container, and
service._manager; update the constructor to avoid coupling by either (A)
changing KosyncProgressService.__init__ to accept the dependencies explicitly
(db, container, manager) and wire them from the composition root where
KosyncService is constructed, or (B) add and use public accessors on
KosyncService (e.g., get_db(), get_container(), get_manager()) and replace uses
of service._db/_container/_manager with those methods; modify all call sites
that construct KosyncProgressService to pass the new arguments or to call the
new accessors accordingly so no private attributes are accessed directly.

In `@src/services/sync_manager_startup.py`:
- Around line 54-61: Remove the unreachable try/except wrapper around the
hasattr checks: inside the is_configured() branch, directly test
hasattr(self.abs_client, "get_ebook_files") and hasattr(self.abs_client,
"search_ebooks") and log via logger.info/logger.warning accordingly; delete the
surrounding try and the except that calls sanitize_exception(exc) so the code is
simpler and not guarding against an impossible hasattr exception on
self.abs_client.

In `@src/utils/runtime_config.py`:
- Around line 6-26: Annotate the runtime config helpers in
src/utils/runtime_config.py: give each function a typed signature (key: str) and
explicit parameter and return types — get_str(key: str, default: str = "") ->
str; get_bool(key: str, default: bool = False) -> bool; get_int(key: str,
default: int) -> int; get_float(key: str, default: float) -> float — so IDEs and
type-checkers (Pyright) understand the expected types; adjust any imports if
needed for typing but no behavior changes to the bodies of get_str, get_bool,
get_int, or get_float.

In `@templates/reading_detail.html`:
- Around line 549-551: The bulk import button with id "bulk-import-highlights"
currently includes a dead data-book-id="{{ book.id }}" attribute because the
click handler uses the _rdAbsId variable (not dataset.bookId); either remove the
unused attribute from the <button> or change the handler to read
event.currentTarget.dataset.bookId and use that value instead of _rdAbsId
(update the handler referenced by its id "bulk-import-highlights" and any
functions that rely on _rdAbsId to accept the bookId from dataset).

In `@tests/test_cache_cleanup_service.py`:
- Around line 7-28: Add unit tests to cover the noted edge cases for
CacheCleanupService.cleanup(): test that cleanup() returns early (no-op) when
epub_cache_dir does not exist; test that suggestions with suggestion.matches set
to None or a non-list do not raise and are skipped (locate usage around
suggestion.matches in CacheCleanupService.cleanup); and test that when
Path.unlink() raises for a specific cached file the exception is caught and the
loop continues (simulate by mocking Path.unlink or creating a file and patching
unlink to raise while asserting other files are still removed). Ensure tests
instantiate CacheCleanupService with a Mock db (get_all_books /
get_all_actionable_suggestions) and verify expected files remain or are removed
after service.cleanup().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 94bff790-45a0-47a0-9e53-a1bae6471175

📥 Commits

Reviewing files that changed from the base of the PR and between 63ac1ba and f2add3b.

📒 Files selected for processing (33)
  • alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py
  • alembic/versions/t9u0v1w2x3y4_merge_detected_books_head.py
  • pyproject.toml
  • pyrightconfig.json
  • requirements.txt
  • src/api/hardcover_client.py
  • src/api/hardcover_routes.py
  • src/api/http_client_base.py
  • src/app_runtime.py
  • src/app_setup.py
  • src/app_template_context.py
  • src/blueprints/api.py
  • src/blueprints/bookfusion_bp.py
  • src/blueprints/reading_bp.py
  • src/blueprints/tbr_bp.py
  • src/db/base_repository.py
  • src/db/database_service.py
  • src/db/hardcover_repository.py
  • src/db/models.py
  • src/services/cache_cleanup_service.py
  • src/services/kosync_progress_service.py
  • src/services/kosync_service.py
  • src/services/sync_manager_startup.py
  • src/sync_manager.py
  • src/utils/http.py
  • src/utils/runtime_config.py
  • src/web_server.py
  • static/js/settings.js
  • templates/reading_detail.html
  • tests/test_app_runtime.py
  • tests/test_cache_cleanup_service.py
  • tests/test_kosync_progress_service.py
  • tests/test_markdown_utils.py
✅ Files skipped from review due to trivial changes (5)
  • pyproject.toml
  • src/utils/http.py
  • tests/test_markdown_utils.py
  • requirements.txt
  • alembic/versions/r5s6t7u8v9w0_remove_bookfusion_matched_abs_id.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • static/js/settings.js

@@ -19,12 +19,13 @@

import requests
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

CI failure: requests is now unused.

Lint is breaking on F401. Now that the POST goes through post_json_with_retries, the local requests import isn't referenced anywhere in this module.

🔧 Fix
-import requests
-
 from src.api.http_client_base import JsonHttpClientBase
🧰 Tools
🪛 GitHub Actions: Lint

[error] 20-20: ruff check failed: F401 requests imported but unused

🪛 GitHub Check: ruff

[failure] 20-20: ruff (F401)
src/api/hardcover_client.py:20:8: F401 requests imported but unused
help: Remove unused import: requests

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/hardcover_client.py` at line 20, The file currently imports requests
but no longer uses it (causing F401); remove the unused import statement from
src/api/hardcover_client.py (delete the top-level "import requests") and ensure
any HTTP logic uses the existing post_json_with_retries function or other
referenced helpers instead of adding a new requests usage so no new unused
imports are introduced.

Comment on lines 218 to 228
hardcover_details = HardcoverDetails(
abs_id=abs_id,
book_id=book.id if book else None,
book_id=book.id,
hardcover_book_id=str(book_id),
hardcover_slug=slug,
hardcover_edition_id=str(edition_id) if edition_id else None,
hardcover_pages=hardcover_pages,
hardcover_audio_seconds=audio_seconds if audio_seconds else None,
hardcover_slug=slug or "",
hardcover_edition_id=str(edition_id) if edition_id else "",
hardcover_pages=int(hardcover_pages) if hardcover_pages is not None else 0,
hardcover_audio_seconds=int(audio_seconds) if audio_seconds is not None else 0,
hardcover_cover_url=cover_url,
matched_by="manual",
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect HardcoverDetails column nullability
ast-grep --pattern 'class HardcoverDetails($_):
  $$$
'
# Check downstream consumers comparing these fields to None
rg -nP --type=py -C2 '\bhardcover_(pages|audio_seconds|slug|edition_id)\b\s*(is|==)\s*(None|0|"")'

Repository: serabi/pagekeeper

Length of output: 4089


🏁 Script executed:

# Read the full context around lines 218-228 and 268-275 in hardcover_routes.py
sed -n '200,280p' src/api/hardcover_routes.py | cat -n

Repository: serabi/pagekeeper

Length of output: 4257


🏁 Script executed:

# Search for all usages of HardcoverDetails fields across codebase
rg -nP --type=py '(hardcover_pages|hardcover_audio_seconds|hardcover_slug|hardcover_edition_id)' -A1 -B1 | head -100

Repository: serabi/pagekeeper

Length of output: 6130


🏁 Script executed:

# Check if there are any queries filtering on these fields being NULL
rg -nP --type=py '(hardcover_pages|hardcover_audio_seconds|hardcover_slug|hardcover_edition_id).*is\s+None|is\s+not\s+None' -B2 -A2

Repository: serabi/pagekeeper

Length of output: 50373


Verify semantic shift from None to 0/"" in HardcoverDetails fields is intentional.

Lines 223–226 and 275 coerce missing values to 0 or empty string, shifting the encoding of "unknown" fields. Specifically:

  • hardcover_pages: None (no info) → 0 (via line 225)
  • hardcover_audio_seconds: None0
  • hardcover_slug and hardcover_edition_id: None""

The database schema allows NULL (columns lack nullable=False), and the -1 case for audiobooks (line 208) still works. However, this change loses the distinction between "unset" and "actually zero/empty", which could affect filtering, UI display, or alignment logic if any code checks for NULL or expects the original sentinel values.

No downstream comparisons (is None, == 0, == "") were found in production code, but the absence of evidence doesn't confirm safety. Confirm the model fields are meant to always store 0/"" rather than NULL, or revert problematic conversions (lines 223–226, 275) to preserve None for unset values.

Also applies to: 268–275 (legacy form data flow)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/api/hardcover_routes.py` around lines 218 - 228, The HardcoverDetails
construction coerces None values into 0 or "" for fields hardcover_pages,
hardcover_audio_seconds, hardcover_slug, and hardcover_edition_id which loses
the distinction between "unknown" and an actual zero/empty value; either (A)
revert these conversions so you pass None (e.g., hardcover_pages=None if input
is None, hardcover_audio_seconds=None, hardcover_slug=None,
hardcover_edition_id=None) so the DB NULL semantics are preserved, or (B) if the
model must always store 0/"" for business reasons, add a clear comment and
validate that the HardcoverDetails model and DB columns are non-nullable and
that all callers (including the legacy form flow that builds HardcoverDetails
elsewhere) expect this semantics; update the HardcoverDetails creation sites
(the block constructing HardcoverDetails and the legacy form-data flow that sets
these same fields) accordingly so behavior is consistent.

Comment thread src/app_runtime.py
# pyright: reportMissingImports=false, reportMissingModuleSource=false

import logging
import os
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Drop the unused os import (ruff F401).

♻️ Diff
 import logging
-import os
 import secrets
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
import os
import logging
import secrets
🧰 Tools
🪛 GitHub Check: ruff

[failure] 4-4: ruff (F401)
src/app_runtime.py:4:8: F401 os imported but unused
help: Remove unused import: os

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_runtime.py` at line 4, Remove the unused import "os" from the top of
src/app_runtime.py to resolve the ruff F401 warning; locate the import statement
(import os) and delete it so only actually used modules remain imported.

Comment thread src/app_runtime.py
Comment on lines +327 to +329
sync_port = get_str("KOSYNC_PORT", "")
if sync_port and int(sync_port) != 4477:
start_split_port_server(app, sync_port)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

int(sync_port) will crash startup on a non-numeric KOSYNC_PORT.

If a user sets KOSYNC_PORT=foo (typo, env templating mishap), start_runtime_services raises ValueError and the whole runtime fails to come up. Validate before casting and log a clear error instead.

♻️ Diff
-    sync_port = get_str("KOSYNC_PORT", "")
-    if sync_port and int(sync_port) != 4477:
-        start_split_port_server(app, sync_port)
+    sync_port = get_str("KOSYNC_PORT", "")
+    if sync_port:
+        try:
+            port_int = int(sync_port)
+        except ValueError:
+            logger.error("KOSYNC_PORT=%r is not a valid integer; skipping split-port server", sync_port)
+        else:
+            if port_int != 4477:
+                start_split_port_server(app, port_int)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_runtime.py` around lines 327 - 329, The code casts sync_port to int
unconditionally (sync_port = get_str("KOSYNC_PORT", ""); if sync_port and
int(sync_port) != 4477: start_split_port_server(...)) which raises ValueError
for non-numeric values; update start_runtime_services (or the function
containing sync_port) to validate sync_port is a valid integer before casting
(e.g., use try/except around int(sync_port) or use a numeric check), log a clear
error message (via the module logger or processLogger) when KOSYNC_PORT is
invalid, and skip calling start_split_port_server instead of letting the
exception crash startup.

Comment on lines +62 to +67
def _is_active_path(path):
req_path = request.path.rstrip("/") or "/"
target_path = path.rstrip("/") or "/"
if target_path == "/":
return req_path == "/"
return req_path == target_path or req_path.startswith(f"{target_path}/")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find any remaining /bookfusion references in templates and route registrations
rg -nP --type=html -C2 'bookfusion'
rg -nP --type=py  -C2 "['\"]/bookfusion"

Repository: serabi/pagekeeper

Length of output: 14009


🏁 Script executed:

# Search for is_active_path calls with '/bookfusion'
rg -n "is_active_path.*bookfusion" --type=html

# Search for /bookfusion route registration in Flask app
rg -n "route.*bookfusion|@.*\.route.*bookfusion" --type=py

# Search for all /bookfusion references in Python
rg -n "/bookfusion" --type=py

Repository: serabi/pagekeeper

Length of output: 3631


🏁 Script executed:

# Search for is_active_path function calls with any bookfusion-related logic
rg -n "is_active_path" --type=html -A3 -B3

# Check the app_template_context.py file context around the code snippet
head -n 80 src/app_template_context.py | tail -n 30

Repository: serabi/pagekeeper

Length of output: 2547


🏁 Script executed:

# Extract the exact context of the dead link in book_card.html
sed -n '290,310p' templates/partials/book_card.html

# Search for any non-API /bookfusion href links
rg -n 'href="[^"]*bookfusion[^"]*"' --type=html | grep -v '/api/'

Repository: serabi/pagekeeper

Length of output: 2122


Remove dead /bookfusion link in book_card.html (line 296).

The /bookfusion page route was removed, but templates/partials/book_card.html still contains a link to /bookfusion?tab=library&q=... (line 296). This link now points to a 404. Either remove the entire service card block or replace it with a valid action.

Dead link location
<a href="/bookfusion?tab=library&q={{ mapping.title|urlencode }}" class="service-value cursor-pointer color-inherit">
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/app_template_context.py` around lines 62 - 67, The anchor with href
"/bookfusion?tab=library&q={{ mapping.title|urlencode }}" in the service card is
a dead link and should be removed or replaced; locate the HTML anchor element
with that exact href (and class "service-value cursor-pointer color-inherit")
and either delete the entire surrounding service card block that contains it, or
update the href to a valid route/action used by the app (or convert it to a
non-navigating element) so it no longer points to the removed /bookfusion page.

Comment on lines 88 to +90
from alembic.config import Config
from alembic.runtime.migration import MigrationContext

from alembic import command
import alembic.command as alembic_command
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Ruff I001: import block needs sorting.

The CI lint is failing on this hunk because import alembic.command as alembic_command belongs grouped with the other third‑party from alembic.* imports in alphabetical order.

🔧 Suggested fix
-            from alembic.config import Config
-            from alembic.runtime.migration import MigrationContext
-            import alembic.command as alembic_command
+            import alembic.command as alembic_command
+            from alembic.config import Config
+            from alembic.runtime.migration import MigrationContext
🧰 Tools
🪛 GitHub Check: ruff

[failure] 88-90: ruff (I001)
src/db/database_service.py:88:13: I001 Import block is un-sorted or un-formatted
help: Organize imports

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/db/database_service.py` around lines 88 - 90, The import block is
unsorted: move "import alembic.command as alembic_command" into the existing
alembic group and alphabetically order the three alembic imports (e.g., group
"from alembic.config import Config", "from alembic.runtime.migration import
MigrationContext", and "import alembic.command as alembic_command") so
third‑party imports are contiguous and alphabetized; update the import ordering
in database_service.py accordingly.

Comment on lines +25 to +28
for suggestion in self.database_service.get_all_actionable_suggestions():
for match in suggestion.matches:
if match.get("filename"):
valid_filenames.add(match["filename"])
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Defensive iteration over suggestion.matches.

matches is a JSON column and can realistically be None or even a non‑list shape for older rows. As written, a single bad row raises TypeError/AttributeError, which the outer try swallows — but the cleanup pass aborts mid‑collection and no orphans get deleted at all until the offending row is fixed. Worth a small guard:

🔧 Suggested fix
-            for suggestion in self.database_service.get_all_actionable_suggestions():
-                for match in suggestion.matches:
-                    if match.get("filename"):
-                        valid_filenames.add(match["filename"])
+            for suggestion in self.database_service.get_all_actionable_suggestions():
+                matches = suggestion.matches or []
+                if not isinstance(matches, list):
+                    continue
+                for match in matches:
+                    if isinstance(match, dict) and match.get("filename"):
+                        valid_filenames.add(match["filename"])
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/cache_cleanup_service.py` around lines 25 - 28, The loop over
suggestion.matches can throw if matches is None or not a list; change the inner
iteration in the code that calls
database_service.get_all_actionable_suggestions() so it first asserts/guards
that suggestion.matches is a list (e.g., if not isinstance(suggestion.matches,
list): skip/log and continue), then iterate only over that list to populate
valid_filenames, ensuring a single malformed row cannot abort the entire
collection pass; reference suggestion.matches, valid_filenames, and
database_service.get_all_actionable_suggestions() when making the change.

Comment on lines +60 to +74
if furthest_wins and kosync_doc and kosync_doc.percentage and not force_update and not same_device:
existing_pct = float(kosync_doc.percentage)
new_pct = float(percentage)
if new_pct < existing_pct - 0.0001:
logger.info(
"KOSync: Ignored progress from '%s' for doc %s... (server has higher: %.2f%% vs new %.2f%%)",
device,
doc_hash[:8],
existing_pct,
new_pct,
)
return {
"document": doc_hash,
"timestamp": int(kosync_doc.timestamp.timestamp()) if kosync_doc.timestamp else int(now.timestamp()),
}, 200
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Inconsistent timestamp shape between the furthest-wins early return and the normal PUT response.

Normal path returns now.isoformat() (string, e.g. 2026-04-25T...+00:00), but the furthest-wins-ignore branch returns int(... .timestamp()) (epoch int). Same endpoint, same caller, two wire formats depending on whether the server kept its own progress — clients/tests that parse the field will break sporadically.

The PR notes already call out the wider Z → +00:00 switch; align this branch with the same ISO format (only booknexus should diverge to epoch, mirroring the logic at lines 120–121).

♻️ Diff
-            if new_pct < existing_pct - 0.0001:
+            if new_pct < existing_pct - 0.0001:
                 logger.info(
                     "KOSync: Ignored progress from '%s' for doc %s... (server has higher: %.2f%% vs new %.2f%%)",
                     device,
                     doc_hash[:8],
                     existing_pct,
                     new_pct,
                 )
-                return {
-                    "document": doc_hash,
-                    "timestamp": int(kosync_doc.timestamp.timestamp()) if kosync_doc.timestamp else int(now.timestamp()),
-                }, 200
+                ts_source = kosync_doc.timestamp or now
+                if device and device.lower() == "booknexus":
+                    ignored_ts = int(calendar.timegm(ts_source.timetuple()))
+                else:
+                    ignored_ts = ts_source.isoformat()
+                return {"document": doc_hash, "timestamp": ignored_ts}, 200

Also applies to: 119-123

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/kosync_progress_service.py` around lines 60 - 74, The
furthest-wins early-return returns timestamp as an epoch int while the normal
PUT response returns now.isoformat(), causing inconsistent wire formats; in the
branch guarded by (furthest_wins and kosync_doc and kosync_doc.percentage and
not force_update and not same_device) replace the int(...) timestamp
construction with the same ISO string used by the normal path (use
kosync_doc.timestamp.isoformat() if available else now.isoformat()), except
preserve the special-case behavior for booknexus (which should still return
epoch as in the logic around lines 120–121); update the returned "timestamp"
generation to mirror the normal PUT response so both paths emit the same
ISO-formatted string.

Comment on lines +187 to +199
if not states:
return {"message": "Document not found on server"}, 502

kosync_state = next((s for s in states if s.client_name.lower() == "kosync"), None)
latest_state = kosync_state or max(states, key=lambda s: s.last_updated or datetime.min)
return {
"device": "pagekeeper",
"device_id": "pagekeeper",
"document": doc_id,
"percentage": float(latest_state.percentage) if latest_state.percentage else 0,
"progress": (latest_state.xpath or latest_state.cfi) if hasattr(latest_state, "xpath") else "",
"timestamp": int(latest_state.last_updated) if latest_state.last_updated else 0,
}, 200
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Confirm last_updated column type on the state model.
ast-grep --pattern 'class $_(Base):
  $$$
  last_updated = $_
  $$$'
rg -nP -C2 '\blast_updated\b\s*=\s*Column' --type=py

Repository: serabi/pagekeeper

Length of output: 14038


🏁 Script executed:

# First, let's look at the function containing lines 187-199
sed -n '170,210p' src/services/kosync_progress_service.py

Repository: serabi/pagekeeper

Length of output: 2039


🏁 Script executed:

# Also search for where states is defined/queried in this function
rg -n 'states\s*=' src/services/kosync_progress_service.py -B 5 -A 2

Repository: serabi/pagekeeper

Length of output: 473


🏁 Script executed:

# Check imports and the full context around line 191
head -n 20 src/services/kosync_progress_service.py

Repository: serabi/pagekeeper

Length of output: 539


🏁 Script executed:

# Check if datetime is imported and how datetime.min is used
rg 'import.*datetime|datetime.min|from datetime' src/services/kosync_progress_service.py

Repository: serabi/pagekeeper

Length of output: 193


🏁 Script executed:

# Let's verify that the State model indeed has Float, not DateTime
sed -n '393,435p' src/db/models.py

Repository: serabi/pagekeeper

Length of output: 1374


🏁 Script executed:

# Also check the full function to see if states could be something other than State objects
sed -n '165,210p' src/services/kosync_progress_service.py

Repository: serabi/pagekeeper

Length of output: 2285


max(states, key=...) fails with TypeError when states contain None values for last_updated.

Line 191 mixes types in the key function: when s.last_updated is falsy, it returns datetime.min (a datetime object), but otherwise returns a float. Python's max() will crash trying to compare floats against datetime.min:

TypeError: '<' not supported between instances of 'float' and 'datetime.datetime'

Since State.last_updated is stored as Column(Float) (Unix timestamp), use float('-inf') or filter None values instead of datetime.min:

latest_state = kosync_state or max((s for s in states if s.last_updated), key=lambda s: s.last_updated, default=states[0])

or

latest_state = kosync_state or max(states, key=lambda s: s.last_updated or float('-inf'))

Line 198's int(latest_state.last_updated) is fine—it's already a float and needs no .timestamp() call.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/services/kosync_progress_service.py` around lines 187 - 199, The max(...)
key mixes datetime and float causing TypeError when State.last_updated is None;
update the logic around kosync_state and latest_state to either filter out
states without last_updated before taking max or change the key to use a numeric
sentinel (e.g., last_updated or float('-inf')) so comparisons are always between
numbers; adjust the selection in the block that computes latest_state (the
variables states, kosync_state, latest_state and the key using last_updated) and
leave the later uses (percentage, xpath/cfi and int(latest_state.last_updated))
unchanged.

Comment on lines +10 to +12
def get_bool(key, default=False):
raw_default = "true" if default else "false"
return os.environ.get(key, raw_default).strip().lower() in ("true", "1", "yes", "on")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Heads-up: empty-string env var silently overrides default=True.

If the env var is set but empty (FOO=""), get_bool("FOO", default=True) returns False because the empty string isn't in the truthy set, while an unset var correctly honors the default. This asymmetry is a common deployment foot-gun (e.g., shell scripts that export blank vars). Consider treating empty/whitespace as "unset":

🛡️ Suggested guard
 def get_bool(key, default=False):
-    raw_default = "true" if default else "false"
-    return os.environ.get(key, raw_default).strip().lower() in ("true", "1", "yes", "on")
+    raw = os.environ.get(key, "").strip().lower()
+    if not raw:
+        return default
+    return raw in ("true", "1", "yes", "on")

The same caveat applies in spirit to get_int/get_float (empty string → ValueError → falls back to default, which is actually the desired behavior there, so no change needed).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/runtime_config.py` around lines 10 - 12, get_bool currently treats
an environment variable set to an empty or whitespace string as a real value and
coerces it to False, which differs from leaving the var unset; update get_bool
to first read os.environ.get(key) into a variable and treat None or
value.strip() == "" as "unset" (return the provided default) and only if
non-empty proceed to normalize and check membership in ("true","1","yes","on");
remove the reliance on raw_default and ensure get_bool(key, default=True)
returns True when the env var is set but empty/whitespace.

Comment thread src/app_runtime.py
new_level_str = get_str("LOG_LEVEL", "INFO").upper()
new_level = getattr(logging, new_level_str, logging.INFO)
logging.getLogger().setLevel(new_level)
logger.info("Logging level updated to %s", new_level_str)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/app_runtime.py:27

When LOG_LEVEL is set to an invalid value like "NOTAREALEVEL", getattr(logging, new_level_str, logging.INFO) silently falls back to logging.INFO, but line 27 still logs "Logging level updated to NOTAREALEVEL". The message misreports which level was actually applied. Use logging.getLevelName(new_level) to log the resolved level.

Suggested change
logger.info("Logging level updated to %s", new_level_str)
logger.info("Logging level updated to %s", logging.getLevelName(new_level))
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/app_runtime.py around line 27:

When `LOG_LEVEL` is set to an invalid value like `"NOTAREALEVEL"`, `getattr(logging, new_level_str, logging.INFO)` silently falls back to `logging.INFO`, but line 27 still logs `"Logging level updated to NOTAREALEVEL"`. The message misreports which level was actually applied. Use `logging.getLevelName(new_level)` to log the resolved level.

Evidence trail:
src/app_runtime.py lines 22-28 at REVIEWED_COMMIT: Line 24 gets `new_level_str` from config, line 25 uses `getattr(logging, new_level_str, logging.INFO)` which falls back to INFO for invalid values, line 27 logs `new_level_str` (the user-provided string) rather than the resolved `new_level`.

Comment thread src/app_runtime.py
Comment on lines +96 to +98
if sync_mgr:
schedule.every(new_period).minutes.do(sync_mgr.sync_cycle).tag("sync_cycle")
logger.info("Sync schedule updated to every %s minutes", new_period)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/app_runtime.py:96

When sync_mgr is None, schedule.clear("sync_cycle") removes the existing sync schedule but the log on line 98 still claims "Sync schedule updated to every %s minutes". This message is misleading because no schedule exists after the clear when sync_mgr is falsy. Consider moving the log inside the if sync_mgr: block so it only logs when a schedule is actually created.

-        if sync_mgr:
-            schedule.every(new_period).minutes.do(sync_mgr.sync_cycle).tag("sync_cycle")
-        logger.info("Sync schedule updated to every %s minutes", new_period)
+        if sync_mgr:
+            schedule.every(new_period).minutes.do(sync_mgr.sync_cycle).tag("sync_cycle")
+            logger.info("Sync schedule updated to every %s minutes", new_period)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/app_runtime.py around lines 96-98:

When `sync_mgr` is `None`, `schedule.clear("sync_cycle")` removes the existing sync schedule but the log on line 98 still claims "Sync schedule updated to every %s minutes". This message is misleading because no schedule exists after the clear when `sync_mgr` is falsy. Consider moving the log inside the `if sync_mgr:` block so it only logs when a schedule is actually created.

Evidence trail:
src/app_runtime.py lines 85-110 viewed at REVIEWED_COMMIT. Line 95 shows `schedule.clear("sync_cycle")` runs unconditionally. Lines 96-97 show `if sync_mgr:` guards the schedule creation. Line 98 shows `logger.info("Sync schedule updated to every %s minutes", new_period)` is outside the `if sync_mgr:` block.

Comment thread src/app_runtime.py
Comment on lines +251 to +255
if not instant_sync_enabled:
logger.info("ABS Socket.IO listener disabled (INSTANT_SYNC_ENABLED=false)")
elif not abs_socket_enabled:
logger.info("ABS Socket.IO listener disabled (ABS_SOCKET_ENABLED=false)")
return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟢 Low src/app_runtime.py:251

When instant_sync_enabled and abs_socket_enabled are both True but container.abs_client().is_configured() returns False, the function silently returns None without logging why the listener was disabled. Lines 251-254 only log when one of the two flags is false, leaving users with no indication that ABS configuration is the actual problem.

    if not instant_sync_enabled:
        logger.info("ABS Socket.IO listener disabled (INSTANT_SYNC_ENABLED=false)")
    elif not abs_socket_enabled:
        logger.info("ABS Socket.IO listener disabled (ABS_SOCKET_ENABLED=false)")
+   elif not container.abs_client().is_configured():
+       logger.info("ABS Socket.IO listener disabled (ABS client not configured)")
    return None
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/app_runtime.py around lines 251-255:

When `instant_sync_enabled` and `abs_socket_enabled` are both True but `container.abs_client().is_configured()` returns False, the function silently returns `None` without logging why the listener was disabled. Lines 251-254 only log when one of the two flags is false, leaving users with no indication that ABS configuration is the actual problem.

Evidence trail:
src/app_runtime.py lines 230-255 at REVIEWED_COMMIT: Line 233 checks three conditions (`instant_sync_enabled`, `abs_socket_enabled`, `container.abs_client().is_configured()`). Lines 251-254 only log for the first two conditions being false via `if not instant_sync_enabled:` and `elif not abs_socket_enabled:`. No logging exists for when `is_configured()` returns False while both flags are True.

Comment thread src/app_runtime.py
Comment on lines +327 to +329
sync_port = get_str("KOSYNC_PORT", "")
if sync_port and int(sync_port) != 4477:
start_split_port_server(app, sync_port)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Medium src/app_runtime.py:327

When KOSYNC_PORT is set to a non-integer string like "abc", int(sync_port) on line 328 throws a ValueError and crashes the startup. Consider validating or catching the exception before conversion.

     sync_port = get_str("KOSYNC_PORT", "")
-    if sync_port and int(sync_port) != 4477:
-        start_split_port_server(app, sync_port)
🚀 Reply "fix it for me" or copy this AI Prompt for your agent:
In file src/app_runtime.py around lines 327-329:

When `KOSYNC_PORT` is set to a non-integer string like `"abc"`, `int(sync_port)` on line 328 throws a `ValueError` and crashes the startup. Consider validating or catching the exception before conversion.

Evidence trail:
src/app_runtime.py lines 327-330 (REVIEWED_COMMIT) - shows `sync_port = get_str("KOSYNC_PORT", "")` followed by unprotected `int(sync_port)` call
src/utils/runtime_config.py lines 6-8 (REVIEWED_COMMIT) - confirms `get_str` returns raw `os.environ.get()` with no validation
src/utils/runtime_config.py lines 15-19 (REVIEWED_COMMIT) - shows `get_int` helper exists with proper ValueError handling, but was not used here

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant