Skip to content

[Feature]: Path locking and selective crash recovery for storage operations #390

@qin-ctx

Description

@qin-ctx

RFC Discussion: #115

Problem Statement

OpenViking's core write operations (rm, mv, add_resource, session.commit) coordinate across multiple subsystems — VikingFS, VectorDB, and QueueManager. Without coordination:

  • Concurrent operations on the same path can corrupt data
  • session.commit() crashes mid-way through memory extraction → extracted memories lost permanently
  • rm() / mv() partial failures leave inconsistent state between FS and VectorDB

Solution

Implement path-level locking for concurrent access coordination and selective crash recovery (redo-log) for session memory extraction — the one critical operation where data loss is unacceptable.

Note

Design evolution: The initial proposal was a heavyweight undo-based TransactionManager/Journal/UndoLog system (~4000 lines). After implementation and review, this was replaced with a lightweight LockManager + RedoLog architecture (~700 lines). The undo/rollback approach was over-engineered for OpenViking's actual consistency needs. See commit 4e44a5d for the refactor rationale.

Architecture

LockContext (async context manager — user-facing entry point)
  └── LockManager (global singleton)
        ├── PathLock (fencing-token-based path locks via .path.ovlock files)
        ├── RedoLog (AGFS-persisted crash recovery for session_memory)
        └── Background cleanup (stale lock detection every 60s)

Key Design Decisions

  1. Inline error handling, not rollback: VikingFS rm/mv use sequential operations with explicit error cleanup. Failures are observable and handled at the call site. This is simpler and more predictable than implicit rollback.

  2. Selective crash recovery (RedoLog): Only session.commit memory extraction is redo-protected. General FS operations do not need redo because partial failures are detectable and VectorDB inconsistencies self-heal through re-indexing.

  3. Fencing tokens for locks: Lock files contain {owner_id}:{monotonic_ns}:{lock_type} for stale lock detection and ABA prevention. Deterministic livelock resolution via timestamp comparison.

  4. Fail-fast by default: lock_timeout=0 means operations fail immediately on lock conflict. Configurable for blocking retries.

Lock Design

Lock Type Purpose Conflict Detection
POINT (P) Single directory lock Checks ancestors for SUBTREE locks
SUBTREE (S) Recursive directory lock Scans descendants for any locks
Operation Lock Mode Error Handling
rm() Dir: subtree; File: point(parent) VectorDB failure → log, still delete FS
mv() SUBTREE(src) + SUBTREE(dst) VectorDB failure → delete copy, propagate
add_resource point(parent dir) Propagate error
session.commit Redo-log (no path lock on archive) Crash → redo marker replayed on restart

Session Commit — Two-Phase with Redo Protection

Phase 1 (Archive): write archive → clear messages
Phase 2 (Memory — redo-protected): write redo marker → extract memories → write → enqueue → mark done

On crash during Phase 2, LockManager.start() replays from the redo marker.

Crash Recovery

On startup, LockManager.start():

  1. Scans /local/_system/redo/ for pending markers
  2. Replays session memory extraction for each
  3. Starts background stale lock cleanup (every 60s)

Scope & Limitations

Important

Single-node only. Distributed scenarios require replacing PathLock with distributed locks (etcd/ZooKeeper) and persisting RedoLog to a consistent distributed store.

Alternatives Considered

  • Full undo-based transactions (WAL + UndoLog): Implemented first, then removed. Over-engineered for actual needs — ~4000 lines of code for rollback semantics that were not necessary given OpenViking's eventual consistency model for VectorDB.
  • Database-level transactions (SQLite): Only covers VectorDB, not AGFS filesystem operations.
  • Saga pattern: More complex orchestration than needed.
  • Event sourcing: Overkill for the current write patterns.

Implementation

PR: #431

  • 77 files changed, +3732/-1429 lines
  • 6 test files covering lock_context, lock_manager, path_lock, redo_log, concurrent_lock, and e2e
  • Transaction code reduced from ~4000 lines to ~700 lines (82% reduction)
  • Post-refactor fixes: TOCTOU race in lock acquisition, resource lock for add_resource, path handling edge cases

Contribution

  • I am willing to contribute to implementing this feature

Metadata

Metadata

Assignees

Labels

enhancementNew feature or request

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions