Skip to content

Architecture upgrade: multi-layer viewer, derived revisions, and processing pipeline scaffolding#38

Merged
gitgrahamdunn merged 1 commit intomainfrom
codex/update-desktop-first-architecture-and-codebase
Mar 9, 2026
Merged

Architecture upgrade: multi-layer viewer, derived revisions, and processing pipeline scaffolding#38
gitgrahamdunn merged 1 commit intomainfrom
codex/update-desktop-first-architecture-and-codebase

Conversation

@gitgrahamdunn
Copy link
Copy Markdown
Owner

Motivation

  • Unlock document transformation operations (extract/insert/delete/reorder/combine) as a proper engine that creates derived revisions instead of mutating originals.
  • Introduce a processing pipeline for text extraction and future OCR jobs with page-addressable extracted text persisted for search and comment linking.
  • Move the viewer from a single raster assumption to an explicit, multi-layer render scene to support overlay comparison, markup and selection layers without coupling UI to PDF.js.
  • Keep renderer implementations swappable and preserve the existing desktop import/open/view experience while adding clear boundaries for future work.

Description

  • Added renderer abstraction and multi-layer scene model in packages/viewer-core including RenderLayer, LayerKind, RenderScene, createBaseRenderScene, and withOverlayPdfLayer to represent stacked layers.
  • Expanded packages/viewer-pdfjs to implement the extended contract (getPageInfo, renderLayer, etc.) while keeping PDF.js confined to the adapter.
  • Introduced transform and processing contracts: packages/document-transform-core, packages/document-transform-pdflib (adapter slot), packages/processing-core, and packages/text-extraction-core to model transformation requests/results and processing jobs/providers.
  • Updated shared domain contracts in packages/shared-types and packages/persistence-core to include derived revision fields, ProcessingJob types/status and ExtractedPageText DTOs and persistence gateway APIs (e.g. extractPagesToDerivedRevision, triggerTextExtraction).
  • Implemented concrete desktop-side persistence and proof paths in Tauri Rust (apps/desktop/src-tauri/src/lib.rs): extended SQLite schema for document_revisions (lineage + derivation_type), processing_jobs, extracted_page_text, and audit_events; added extract_pages_to_derived_revision that creates a new derived revision using lopdf and writes audit events; added trigger_text_extraction that extracts text from text PDFs into extracted_page_text and updates job status.
  • Updated frontend wiring in apps/desktop to consume the new persistence contract (apps/desktop/src/lib/tauriGateway.ts) and surfaced minimal dev scaffolding controls in the UI (Trigger text extraction, Extract page 1 to derived revision) plus a PdfViewer update to show scene layer info and accept optional overlay bytes for comparison proof.
  • Added ADR docs/adr/0002-multi-layer-transform-processing-boundaries.md, updated architecture package and README to document the new boundaries and how they should be used going forward.
  • Added tests and test scaffolding: UI tests updated/added (PdfViewer.test.tsx, App.test.tsx) to assert scene layering and isolation of PDF.js from UI; Rust unit tests for schema, transform extraction helper, and processing/job table behaviors were added to Tauri service.

Testing

  • UI unit tests: vitest test runner invocation via npm test was attempted but could not run in this environment due to missing workspace dependencies and registry access; outcome: npm test failed (missing vitest / npm install blocked by registry 403).
  • Typecheck: npm run typecheck was attempted and failed because TypeScript type dependencies could not be installed in the environment (missing @testing-library/jest-dom / vitest/globals).
  • Rust tests: cargo test was attempted but failed due to blocked crates.io access (network/registry 403), preventing fetching dependencies; Rust unit tests were added (schema checks, extract_pages proof, processing/job table tests) but could not execute here.
  • Summary of automated outcomes: tests were added and runnable in a normal developer environment, but they could not complete in this CI-like sandbox because of external registry/network restrictions; expected to pass once npm install / cargo test can fetch dependencies and run locally or in CI.

To validate locally: run npm install at repo root, then npm test and npm run typecheck, and run cargo test under apps/desktop/src-tauri to exercise Rust-side unit tests and proof paths.


Codex Task

@gitgrahamdunn gitgrahamdunn merged commit 43f6a72 into main Mar 9, 2026
2 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee201b3b13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

)
.map_err(|e| e.to_string())?;
conn.execute(
"INSERT INTO document_revisions (id, document_id, revision_number, managed_file_path, original_file_name, page_count, file_size_bytes, imported_at, source_revision_id, derivation_type) VALUES (?1, ?2, 1, ?3, ?4, NULL, ?5, ?6, NULL, 'imported_original')",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Migrate existing revision schema before writing new columns

This insert assumes document_revisions already has source_revision_id and derivation_type, but init_schema only uses CREATE TABLE IF NOT EXISTS and does not migrate databases created by the previous schema (which used source_path instead). On upgraded installs with an existing gitplant.db, this statement fails with no column named source_revision_id, blocking imports and any workflow that creates a new revision until the DB is manually reset.

Useful? React with 👍 / 👎.

Comment thread apps/desktop/src/App.tsx
Comment on lines +56 to +58
setBytes(new Uint8Array(base));
setOverlayBytes(new Uint8Array(overlay));
setCurrentRevisionId(derived.derivedRevisionId);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep displayed document bytes aligned with active revision

In the extract-proof flow you render the source bytes (base) but set currentRevisionId to the derived revision, and the onPageCount callback writes page counts to updatePageCount(currentRevisionId, n). That means opening the proof view can overwrite the derived revision's page_count with the source document's page count, corrupting revision metadata after extractPagesToDerivedRevision runs.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant