Skip to content

feat(storage): add path locking and selective crash recovery for write operations#431

Merged
zhoujh01 merged 29 commits intomainfrom
feature/transaction-support
Mar 18, 2026
Merged

feat(storage): add path locking and selective crash recovery for write operations#431
zhoujh01 merged 29 commits intomainfrom
feature/transaction-support

Conversation

@qin-ctx
Copy link
Collaborator

@qin-ctx qin-ctx commented Mar 5, 2026

Description

Implement path-level locking and selective crash recovery for VikingFS storage operations. Core write operations (rm, mv, add_resource, session.commit) are coordinated through LockManager with fencing-token-based path locks and a RedoLog for session memory crash recovery.

Note

This PR went through a major design iteration: the initial undo-based TransactionManager/Journal/UndoLog system (~4000 lines) was replaced with a lightweight LockManager + RedoLog architecture (~700 lines), reducing complexity by ~82%. See commit 4e44a5d for the refactor rationale.

cc @r266-tech

Related Issue

Closes #390

RFC Discussion: #115

Type of Change

  • Bug fix
  • New feature (non-breaking change that adds functionality)
  • Documentation update
  • Test update

Scope & Limitations

Important

This design is for single-node deployments only and does not support multi-node/distributed scenarios.

Component Current Approach Multi-node Requirements
PathLock File-based locks (fencing token in .path.ovlock) Distributed locks (etcd/ZooKeeper) with atomic CAS + session-bound leases
RedoLog AGFS-persisted markers at /local/_system/redo/ Persist to distributed consistent store
QueueFS (SQLite) SQLite + WAL mode, RecoverStale on startup Distributed message queue or TiDB/MySQL backend

Architecture

Overview

LockContext (async context manager — user-facing entry point)
  └── LockManager (global singleton)
        ├── PathLock (fencing-token-based path locks via .path.ovlock files)
        ├── RedoLog (AGFS-persisted crash recovery for session_memory)
        └── Background cleanup (stale lock detection every 60s)

Design Decisions

  1. Inline error handling, not rollback: VikingFS rm/mv use sequential operations with explicit error cleanup instead of undo-log rollback. Failures are observable, not hidden.

  2. Selective crash recovery (RedoLog): Only session.commit memory extraction is redo-protected — the one operation where data loss is unacceptable and the operation is idempotent enough to safely replay. General FS operations don't need redo because partial failures are detectable.

  3. Fencing tokens for locks: Lock files contain {owner_id}:{monotonic_ns}:{lock_type} for stale lock detection and ABA prevention.

  4. Fail-fast by default: lock_timeout=0 means operations fail immediately on lock conflict. Set to positive value for blocking retries.

Lock Design

Two lock types:

Lock Type Purpose Conflict Detection
POINT (P) Single directory (write/semantic) Checks ancestors for SUBTREE locks
SUBTREE (S) Entire subtree (rm/mv-source) Scans descendants for any locks

Three locking modes (LockContext):

Mode Lock Combination Used By
point POINT on each path add_resource, session.commit
subtree SUBTREE on each path rm (directories)
mv SUBTREE on source + SUBTREE on destination mv

Livelock prevention: Deterministic backoff via fencing token timestamp comparison.

Stale lock cleanup: Background loop (every 60s) detects locks held >3600s; lock_expire config (default 300s) enables force-release on acquisition conflict.

Per-Command Design

Command Lock Mode Error Handling
rm() Dir: subtree; File: point(parent) VectorDB delete failure → log warning, still delete FS (eventual consistency)
mv() mv: SUBTREE(src) + SUBTREE(dst) VectorDB update failure → delete copy, propagate error (source untouched)
add_resource (finalize) point(parent dir) FS or enqueue failure → propagate error
session.commit Redo-log protected (no lock on archive phase) Phase 1 failure → no redo marker; Phase 2 failure → redo marker persists for recovery

Session Commit — Two-Phase with Redo Protection

Phase 1 (Archive — no lock):
  increment index → generate summary → write archive → clear messages

Phase 2 (Memory — redo-protected):
  write redo marker → extract memories → write memories → enqueue semantic → mark done

If the process crashes during Phase 2, the redo marker persists. On restart, LockManager.start() scans /local/_system/redo/ and replays memory extraction.

Crash Recovery

On LockManager.start():

  1. Scan /local/_system/redo/ for pending markers
  2. For each pending task: re-extract memories from archived messages, write, enqueue semantic
  3. Delete marker on success
  4. Start background stale lock cleanup loop

Changes Made

  • LockManager (lock_manager.py): Global singleton managing lock lifecycle, redo recovery, background stale cleanup
  • LockContext (lock_context.py): Async context manager for lock acquisition/release
  • LockHandle (lock_handle.py): Lock ownership tracking with LockOwner protocol
  • PathLock (path_lock.py): Fencing-token-based path locks with POINT/SUBTREE types; fixed TOCTOU race in lock acquisition
  • RedoLog (redo_log.py): Selective crash recovery for session_memory operations
  • LockObserver (lock_observer.py): Monitoring for active/hanging/stale locks
  • VikingFS: rm()/mv() use LockContext with inline error handling
  • ResourceProcessor: add_resource finalize phase uses point lock on parent directory
  • Session: commit_async() uses redo-log for memory extraction crash safety
  • QueueFS: New SQLite backend (WAL + RecoverStale at-least-once)
  • Documentation: Updated transaction mechanism docs (en/zh)
  • Tests: 6 test files covering lock_context, lock_manager, path_lock, redo_log, concurrent_lock, and e2e

Recent Fixes

  • Removed checkpoint dead code and fixed TOCTOU race condition in lock acquisition (fe33516)
  • Fixed resource lock — add_resource finalize now acquires point lock on parent directory (715739d)
  • Fixed path handling edge cases (1010bd4)

Testing

  • I have added tests that prove my fix is effective or that my feature works
  • New and existing unit tests pass locally with my changes
  • I have tested this on the following platforms:
    • Linux
    • macOS
    • Windows

Checklist

  • My code follows the project's coding style
  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • Any dependent changes have been merged and published

Additional Notes

  • 77 files changed, +3732/-1429 lines
  • Design went through a major iteration: initial undo-based system replaced with lightweight lock + redo-log (commit 4e44a5d), reducing transaction code from ~4000 lines to ~700 lines

qin-ctx and others added 3 commits March 5, 2026 15:10
…recovery

Implement a full transaction system for VikingFS storage operations including
write-ahead journal, path locking, undo/rollback, context manager API, and
crash recovery. Includes comprehensive tests and documentation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…tions

Add end-to-end tests covering rollback scenarios that were missing:
- mv rollback: file moved back to original location on failure
- mv commit: file persists at new location
- Multi-step rollback: mkdir + write + mkdir all reversed in order
- Partial step rollback: only completed entries are reversed
- Nested directory rollback: child removed before parent
- Best-effort rollback: single step failure does not block others

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@CLAassistant
Copy link

CLAassistant commented Mar 5, 2026

CLA assistant check
All committers have signed the CLA.

@qin-ctx
Copy link
Collaborator Author

qin-ctx commented Mar 5, 2026

/review

@github-actions
Copy link

github-actions bot commented Mar 5, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🎫 Ticket compliance analysis 🔶

390 - Partially compliant

Compliant requirements:

  • Transaction mechanism with journal and undo log implemented
  • Rollback and crash recovery added
  • Path locking with fencing tokens implemented
  • Tests added for various scenarios

Non-compliant requirements:

  • (Cannot fully assess integration with all core operations without full diff)

Requires further human verification:

  • Integration with core operations (rm, mv, add_resource, session.commit)
  • Multi-node deployment limitations and fallback behavior
⏱️ Estimated effort to review: 4 🔵🔵🔵🔵⚪
🧪 PR contains tests
🔒 No security concerns identified
⚡ Recommended focus areas for review

Potential Typo in Status Enum

Verify that TransactionStatus.AQUIRE is the intended enum value (check for possible typo: ACQUIRE).

elif tx.status in (TransactionStatus.INIT, TransactionStatus.AQUIRE):
Async Method Calling Sync I/O

Ensure that AGFS client calls (cat, write, ls, rm, stat) are properly awaited if they are async, or document that they are synchronous to avoid event loop blocking.

        content = self._agfs.cat(lock_path)
        if isinstance(content, bytes):
            return content.decode("utf-8").strip()
        return str(content).strip()
    except Exception:
        return None

async def _is_locked_by_other(self, lock_path: str, transaction_id: str) -> bool:
    """Check if path is locked by another transaction (any lock type)."""
    token = self._read_token(lock_path)
    if token is None:
        return False
    lock_owner, _, _ = _parse_fencing_token(token)
    return lock_owner != transaction_id

async def _create_lock_file(
    self, lock_path: str, transaction_id: str, lock_type: str = LOCK_TYPE_POINT
) -> None:
    """Create lock file with fencing token."""
    token = _make_fencing_token(transaction_id, lock_type)
    self._agfs.write(lock_path, token.encode("utf-8"))

@MaojiaSheng
Copy link
Collaborator

related discussion in #472

# Conflicts:
#	openviking/parse/tree_builder.py
#	openviking/session/session.py
@chuanbao666
Copy link
Collaborator

关于隔离级别是如何设计的?感觉像是Read Uncommitted?

Implement transaction system for VikingFS with ACID-like guarantees:
- TransactionManager with configurable lock timeout and journal-based recovery
- PathLock supporting point, subtree, and mv lock modes
- Refactor VikingFS mv to use cp+rm to prevent lock files from being carried
- Fix stale lock detection returning false for missing lock files
- Update ragas eval to use LangchainLLMWrapper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
qin-ctx and others added 8 commits March 15, 2026 22:48
- Reconstruct RequestContext from undo params for vectordb_delete/update_uri
  rollback (previously skipped silently due to missing ctx)
- Serialize ctx fields into undo params in rm/mv operations
- Fix Phase 1 undo path to target archive dir instead of session root
- Remove Phase 2 fs_write_new undo (overwrites are idempotent, checkpoint
  handles recovery)
- Add ancestor SUBTREE recheck after lock creation in acquire_subtree
- Move _collect_uris inside TransactionContext in rm/mv to close race window
- Log journal persistence failures instead of silently swallowing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ests with real backends

Remove all optional/fallback code paths where tx_manager could be None. get_transaction_manager()
now raises RuntimeError if not initialized. Fix undo rollback to reconstruct ctx for vectordb_upsert
and use correct agent_id default. Replace mock-based transaction tests with integration tests using
real AGFS and VectorDB backends.
…mmit path

- Convert execute_rollback/rollback_entry to async, removing sync run_async wrappers
- Unify Session.commit() to delegate to commit_async(), removing duplicate phase methods
- Fix SUBTREE lock to conflict with ancestor SUBTREE locks (was previously missing)
- Fix mv lock mode: directory moves now use SUBTREE on both source and destination
- Replace deprecated asyncio.get_event_loop() with get_running_loop()
- Remove max_parallel_locks config option
- Update docs (en/zh) and tests to match new async rollback signatures
@qin-ctx qin-ctx changed the title WIP: feat(storage): add transaction support with journal, undo, and crash recovery feat(storage): add transaction support with journal, undo, and crash recovery Mar 16, 2026
qin-ctx added 4 commits March 16, 2026 21:15
…sh recovery

Session commit no longer wraps archive phase in a transaction. Phase 2 uses
redo semantics so crashed memory-extraction can be replayed from archive.
PathLock stale-lock cleanup no longer redundantly re-checks timeout.
Semantic processor vectorization runs concurrently via asyncio.gather.
qin-ctx added 4 commits March 17, 2026 15:32
…ghtweight lock + redo-log

Remove the heavyweight TransactionManager/Journal/UndoEntry system (~4000 lines) and
replace it with a simpler architecture: LockManager for path locking, LockContext as
the async context manager, LockHandle/LockOwner protocol, and a RedoLog for crash
recovery of session_memory operations. VikingFS rm/mv now use inline error handling
instead of rollback semantics. Updated docs, observers, and tests accordingly.

Co-Authored-By: Claude Opus 4.6
@qin-ctx qin-ctx changed the title feat(storage): add transaction support with journal, undo, and crash recovery feat(storage): add path locking and selective crash recovery for write operations Mar 17, 2026
qin-ctx added 7 commits March 17, 2026 20:57
…fy mv lock param

- Remove unused _write_checkpoint/_write_checkpoint_async/_read_checkpoint
  from Session (superseded by redo-log)
- Re-resolve URI inside lock in resource_processor Phase 3.5 to prevent
  concurrent add_resource calls from resolving to the same final_uri
- Rename acquire_mv dst_path to dst_parent_path with docstring to clarify
  that callers pass the destination parent directory
@zhoujh01 zhoujh01 merged commit 1823a7c into main Mar 18, 2026
30 of 31 checks passed
@zhoujh01 zhoujh01 deleted the feature/transaction-support branch March 18, 2026 06:50
@github-project-automation github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

[Feature]: Path locking and selective crash recovery for storage operations

5 participants