Architecture upgrade: multi-layer viewer, derived revisions, and processing pipeline scaffolding by gitgrahamdunn · Pull Request #38 · gitgrahamdunn/GITPLANT

gitgrahamdunn · 2026-03-09T16:20:04Z

Motivation

Unlock document transformation operations (extract/insert/delete/reorder/combine) as a proper engine that creates derived revisions instead of mutating originals.
Introduce a processing pipeline for text extraction and future OCR jobs with page-addressable extracted text persisted for search and comment linking.
Move the viewer from a single raster assumption to an explicit, multi-layer render scene to support overlay comparison, markup and selection layers without coupling UI to PDF.js.
Keep renderer implementations swappable and preserve the existing desktop import/open/view experience while adding clear boundaries for future work.

Description

Added renderer abstraction and multi-layer scene model in packages/viewer-core including RenderLayer, LayerKind, RenderScene, createBaseRenderScene, and withOverlayPdfLayer to represent stacked layers.
Expanded packages/viewer-pdfjs to implement the extended contract (getPageInfo, renderLayer, etc.) while keeping PDF.js confined to the adapter.
Introduced transform and processing contracts: packages/document-transform-core, packages/document-transform-pdflib (adapter slot), packages/processing-core, and packages/text-extraction-core to model transformation requests/results and processing jobs/providers.
Updated shared domain contracts in packages/shared-types and packages/persistence-core to include derived revision fields, ProcessingJob types/status and ExtractedPageText DTOs and persistence gateway APIs (e.g. extractPagesToDerivedRevision, triggerTextExtraction).
Implemented concrete desktop-side persistence and proof paths in Tauri Rust (apps/desktop/src-tauri/src/lib.rs): extended SQLite schema for document_revisions (lineage + derivation_type), processing_jobs, extracted_page_text, and audit_events; added extract_pages_to_derived_revision that creates a new derived revision using lopdf and writes audit events; added trigger_text_extraction that extracts text from text PDFs into extracted_page_text and updates job status.
Updated frontend wiring in apps/desktop to consume the new persistence contract (apps/desktop/src/lib/tauriGateway.ts) and surfaced minimal dev scaffolding controls in the UI (Trigger text extraction, Extract page 1 to derived revision) plus a PdfViewer update to show scene layer info and accept optional overlay bytes for comparison proof.
Added ADR docs/adr/0002-multi-layer-transform-processing-boundaries.md, updated architecture package and README to document the new boundaries and how they should be used going forward.
Added tests and test scaffolding: UI tests updated/added (PdfViewer.test.tsx, App.test.tsx) to assert scene layering and isolation of PDF.js from UI; Rust unit tests for schema, transform extraction helper, and processing/job table behaviors were added to Tauri service.

Testing

UI unit tests: vitest test runner invocation via npm test was attempted but could not run in this environment due to missing workspace dependencies and registry access; outcome: npm test failed (missing vitest / npm install blocked by registry 403).
Typecheck: npm run typecheck was attempted and failed because TypeScript type dependencies could not be installed in the environment (missing @testing-library/jest-dom / vitest/globals).
Rust tests: cargo test was attempted but failed due to blocked crates.io access (network/registry 403), preventing fetching dependencies; Rust unit tests were added (schema checks, extract_pages proof, processing/job table tests) but could not execute here.
Summary of automated outcomes: tests were added and runnable in a normal developer environment, but they could not complete in this CI-like sandbox because of external registry/network restrictions; expected to pass once npm install / cargo test can fetch dependencies and run locally or in CI.

To validate locally: run npm install at repo root, then npm test and npm run typecheck, and run cargo test under apps/desktop/src-tauri to exercise Rust-side unit tests and proof paths.

Codex Task

…ding

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ee201b3b13

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-03-09T16:24:18Z

+    )
+    .map_err(|e| e.to_string())?;
+    conn.execute(
+      "INSERT INTO document_revisions (id, document_id, revision_number, managed_file_path, original_file_name, page_count, file_size_bytes, imported_at, source_revision_id, derivation_type) VALUES (?1, ?2, 1, ?3, ?4, NULL, ?5, ?6, NULL, 'imported_original')",


Migrate existing revision schema before writing new columns

This insert assumes document_revisions already has source_revision_id and derivation_type, but init_schema only uses CREATE TABLE IF NOT EXISTS and does not migrate databases created by the previous schema (which used source_path instead). On upgraded installs with an existing gitplant.db, this statement fails with no column named source_revision_id, blocking imports and any workflow that creates a new revision until the DB is manually reset.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-03-09T16:24:18Z

+    setBytes(new Uint8Array(base));
+    setOverlayBytes(new Uint8Array(overlay));
+    setCurrentRevisionId(derived.derivedRevisionId);


Keep displayed document bytes aligned with active revision

In the extract-proof flow you render the source bytes (base) but set currentRevisionId to the derived revision, and the onPageCount callback writes page counts to updatePageCount(currentRevisionId, n). That means opening the proof view can overwrite the derived revision's page_count with the source document's page count, corrupting revision metadata after extractPagesToDerivedRevision runs.

Useful? React with 👍 / 👎.

Add multi-layer viewer, transform and processing architecture scaffol…

ee201b3

…ding

gitgrahamdunn added the codex label Mar 9, 2026 — with ChatGPT Codex Connector

gitgrahamdunn merged commit 43f6a72 into main Mar 9, 2026
2 checks passed

chatgpt-codex-connector bot reviewed Mar 9, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Architecture upgrade: multi-layer viewer, derived revisions, and processing pipeline scaffolding#38

Architecture upgrade: multi-layer viewer, derived revisions, and processing pipeline scaffolding#38
gitgrahamdunn merged 1 commit intomainfrom
codex/update-desktop-first-architecture-and-codebase

gitgrahamdunn commented Mar 9, 2026

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

gitgrahamdunn commented Mar 9, 2026

Motivation

Description

Testing

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Mar 9, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant