Skip to content

Conversation

@cyrusagent
Copy link

@cyrusagent cyrusagent bot commented Oct 19, 2025

Summary

Implements the core file processing system for importing and syncing Logseq markdown directories using a pragmatic DDD architecture suitable for a personal project.

Resolves #PER-5

Core Features Implemented

🗂️ ImportService

  • ✅ Import entire Logseq directories (pages/ and journals/)
  • ✅ Bounded concurrency (4-6 files at once, configurable)
  • ✅ Real-time progress tracking with callbacks
  • ✅ Graceful error handling (continues on individual file failures)
  • ✅ Returns ImportSummary with detailed statistics

🔄 SyncService

  • ✅ Incremental file synchronization with file watching
  • ✅ 500ms debouncing window (configurable)
  • ✅ Auto-sync on file changes (create, update, delete)
  • ✅ Event callbacks for sync operations
  • ✅ Runs indefinitely watching for changes

📝 Logseq Markdown Parser

  • ✅ Async file parsing with Tokio
  • ✅ Indentation-based hierarchy parsing (tabs or 2-space indents)
  • ✅ Automatic URL extraction from content
  • ✅ Page reference ([[page]]) and tag (#tag) extraction
  • ✅ Converts markdown files to Page and Block domain objects

📂 File System Utilities

  • ✅ Recursive markdown file discovery
  • ✅ Logseq-specific directory filtering
  • ✅ Cross-platform file watching with debouncing
  • ✅ Event filtering to .md files only

Architecture

This implementation follows a three-layer DDD architecture:

  • Domain Layer: Value objects (LogseqDirectoryPath, ImportProgress) and events
  • Application Layer: Services (ImportService, SyncService)
  • Infrastructure Layer: File system operations, markdown parsing

See IMPLEMENTATION.md for detailed architecture documentation.

Key Design Decisions

Following simplified DDD for personal projects:

  • ✅ No complex event sourcing (events for notifications only)
  • ✅ Direct callbacks (no event bus/CQRS complexity)
  • ✅ Simple error handling (continue on error, collect failures)
  • ✅ File system as source of truth (no conflict resolution)
  • ✅ Bounded concurrency using tokio::sync::Semaphore

Dependencies Added

  • notify (6.1): Cross-platform file watching
  • notify-debouncer-mini (0.4): Event debouncing
  • tokio (1.41): Async runtime
  • uuid (1.11): UUID generation
  • thiserror, anyhow: Error handling
  • tracing: Structured logging
  • tempfile (dev): Testing utilities

Testing

  • ✅ Comprehensive unit tests for all components
  • ✅ Domain layer: Value objects and events
  • ✅ Infrastructure layer: Parser, file discovery, watcher
  • ✅ Application layer: Import service statistics
  • ✅ Integration test structure ready

Documentation

What's Next

Future enhancements (not included in this PR):

  • SQLite persistence for PageRepository
  • File→Page mapping for proper deletion handling
  • Tauri integration (commands and event emitters)
  • Full-text search with Tantivy
  • UI components for import/sync status

Testing Instructions

  1. Build the project:

    cargo build
  2. Run tests:

    cargo test
  3. Run with logging:

    RUST_LOG=debug cargo test

Review Focus Areas

  1. Architecture: Is the simplified DDD approach appropriate?
  2. Error Handling: Does the "continue on error" strategy make sense?
  3. Concurrency: Is the bounded concurrency (4-6 files) reasonable?
  4. API Design: Are the service interfaces intuitive?
  5. Testing: Is test coverage adequate?

🤖 Generated with Claude Code

Cyrus AI and others added 6 commits October 19, 2025 03:41
… DDD architecture

This commit implements the core file processing system for importing and syncing
Logseq markdown directories, following a pragmatic DDD approach suitable for a
personal project.

## Core Features

### ImportService
- Import entire Logseq directories (pages/ and journals/)
- Bounded concurrency (4-6 files at once, configurable)
- Real-time progress tracking with callbacks
- Graceful error handling (continues on individual file failures)
- Returns ImportSummary with statistics

### SyncService
- Incremental file synchronization with file watching
- 500ms debouncing window (configurable)
- Auto-sync on file changes (create, update, delete)
- Event callbacks for sync operations
- Runs indefinitely watching for changes

## Domain Layer Changes

### New Value Objects
- LogseqDirectoryPath: Validated directory with pages/ and journals/
- ImportProgress: Tracks import progress (files, percentage)

### New Domain Events
- Import events: ImportStarted, FileProcessed, ImportCompleted, ImportFailed
- Sync events: SyncStarted, FileCreatedEvent, FileUpdatedEvent, FileDeletedEvent,
  SyncCompleted

## Infrastructure Layer (New)

### Logseq Markdown Parser
- Async file parsing with Tokio
- Indentation-based hierarchy (tabs or 2-space indents)
- URL extraction (http://, https://)
- Page reference ([[page]]) and tag (#tag) extraction
- Converts markdown to Page/Block domain objects

### File System Utilities
- discover_markdown_files(): Recursive .md file discovery
- discover_logseq_files(): Find files in pages/ and journals/
- LogseqFileWatcher: Cross-platform file watching with debouncing
- Filters to only .md files in Logseq directories

## Dependencies Added
- notify (6.1): Cross-platform file watching
- notify-debouncer-mini (0.4): Event debouncing
- tokio (1.41): Async runtime with fs, rt-multi-thread, macros, sync, time
- serde, serde_json: Serialization
- thiserror, anyhow: Error handling
- tracing, tracing-subscriber: Structured logging
- uuid (1.11): UUID generation for IDs
- tempfile (3.14): Dev dependency for tests

## Architecture Decisions

Following simplified DDD for personal projects:
- No complex event sourcing (events for notifications only)
- Direct callbacks (no event bus/CQRS complexity)
- Simple error handling (continue on error, collect failures)
- File system as source of truth (no conflict resolution)
- No import session persistence

## Documentation
- Comprehensive IMPLEMENTATION.md with architecture, components, usage examples
- CHANGELOG.md documenting all changes
- Inline code documentation and tests

## Testing
- Unit tests for all components
- Integration test structure ready
- Test coverage for domain, infrastructure, and application layers

Resolves: PER-5

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
- Fix partial move error in sync_service.rs by borrowing operation in match
- Add Entity trait import to import_service.rs tests
- Update watcher.rs to use DebouncedEventKind from notify-debouncer-mini
- Fix Block constructor calls to match correct signatures (new_root, new_child)
- Fix iterator issues in tests by collecting all_blocks() before indexing
- Add Box::pin to discover_markdown_files for recursive async function
- Remove unused imports and variables to eliminate warnings
- Fix type mismatch in sync_service.rs: clone PathBuf when passing to event
- Fix non-exhaustive pattern match in watcher.rs: add wildcard pattern
- Remove unused imports from import_service.rs, entities.rs, and aggregates.rs
- Remove unused variable path_buf from import_service.rs

These fixes address all compilation errors reported by the GitHub Actions CI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed the extract_page_references function to preserve the order of
appearance of [[page references]] and #tags as they appear in the
markdown content. Previously, all [[brackets]] were extracted first,
then all #tags, which broke the expected ordering in tests.

Rewrote the parser to use a single-pass character-by-character approach
that maintains proper ordering.

Fixes test_extract_page_references test failure.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Implemented sync_once() method that performs a one-time synchronization
of a Logseq directory, detecting and handling:
- New files (creates pages)
- Updated files (compares modification time and updates pages)
- Deleted files (removes from repository)
- Unchanged files (skips processing)

Features:
- Maintains sync registry to track file metadata and modification times
- Intelligent change detection using file modification timestamps
- Proper deletion handling with title-based lookup
- Support for optional callbacks to track sync progress
- Returns detailed SyncSummary with operation counts and errors

Added comprehensive test coverage:
- test_sync_once_new_files
- test_sync_once_updated_files
- test_sync_once_unchanged_files
- test_sync_once_deleted_files
- test_sync_once_mixed_operations
- test_sync_once_with_journals
- test_sync_once_with_callback

All 125 tests passing.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
Fixed test_parse_with_urls_and_references to use root_blocks() which
preserves insertion order, instead of all_blocks() which iterates over
a HashMap with non-deterministic order.

Also added more detailed assertions to verify the parsed content
includes correct URLs and page references.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
@weswalla weswalla merged commit b6b3991 into main Oct 19, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant