-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
RFC Discussion: #115
Problem Statement
OpenViking's core write operations (rm, mv, add_resource, session.commit) coordinate across multiple subsystems — VikingFS, VectorDB, and QueueManager. Without coordination:
- Concurrent operations on the same path can corrupt data
session.commit()crashes mid-way through memory extraction → extracted memories lost permanentlyrm()/mv()partial failures leave inconsistent state between FS and VectorDB
Solution
Implement path-level locking for concurrent access coordination and selective crash recovery (redo-log) for session memory extraction — the one critical operation where data loss is unacceptable.
Note
Design evolution: The initial proposal was a heavyweight undo-based TransactionManager/Journal/UndoLog system (~4000 lines). After implementation and review, this was replaced with a lightweight LockManager + RedoLog architecture (~700 lines). The undo/rollback approach was over-engineered for OpenViking's actual consistency needs. See commit 4e44a5d for the refactor rationale.
Architecture
LockContext (async context manager — user-facing entry point)
└── LockManager (global singleton)
├── PathLock (fencing-token-based path locks via .path.ovlock files)
├── RedoLog (AGFS-persisted crash recovery for session_memory)
└── Background cleanup (stale lock detection every 60s)
Key Design Decisions
-
Inline error handling, not rollback: VikingFS
rm/mvuse sequential operations with explicit error cleanup. Failures are observable and handled at the call site. This is simpler and more predictable than implicit rollback. -
Selective crash recovery (RedoLog): Only
session.commitmemory extraction is redo-protected. General FS operations do not need redo because partial failures are detectable and VectorDB inconsistencies self-heal through re-indexing. -
Fencing tokens for locks: Lock files contain
{owner_id}:{monotonic_ns}:{lock_type}for stale lock detection and ABA prevention. Deterministic livelock resolution via timestamp comparison. -
Fail-fast by default:
lock_timeout=0means operations fail immediately on lock conflict. Configurable for blocking retries.
Lock Design
| Lock Type | Purpose | Conflict Detection |
|---|---|---|
| POINT (P) | Single directory lock | Checks ancestors for SUBTREE locks |
| SUBTREE (S) | Recursive directory lock | Scans descendants for any locks |
| Operation | Lock Mode | Error Handling |
|---|---|---|
rm() |
Dir: subtree; File: point(parent) | VectorDB failure → log, still delete FS |
mv() |
SUBTREE(src) + SUBTREE(dst) | VectorDB failure → delete copy, propagate |
add_resource |
point(parent dir) | Propagate error |
session.commit |
Redo-log (no path lock on archive) | Crash → redo marker replayed on restart |
Session Commit — Two-Phase with Redo Protection
Phase 1 (Archive): write archive → clear messages
Phase 2 (Memory — redo-protected): write redo marker → extract memories → write → enqueue → mark done
On crash during Phase 2, LockManager.start() replays from the redo marker.
Crash Recovery
On startup, LockManager.start():
- Scans
/local/_system/redo/for pending markers - Replays session memory extraction for each
- Starts background stale lock cleanup (every 60s)
Scope & Limitations
Important
Single-node only. Distributed scenarios require replacing PathLock with distributed locks (etcd/ZooKeeper) and persisting RedoLog to a consistent distributed store.
Alternatives Considered
- Full undo-based transactions (WAL + UndoLog): Implemented first, then removed. Over-engineered for actual needs — ~4000 lines of code for rollback semantics that were not necessary given OpenViking's eventual consistency model for VectorDB.
- Database-level transactions (SQLite): Only covers VectorDB, not AGFS filesystem operations.
- Saga pattern: More complex orchestration than needed.
- Event sourcing: Overkill for the current write patterns.
Implementation
PR: #431
- 77 files changed, +3732/-1429 lines
- 6 test files covering lock_context, lock_manager, path_lock, redo_log, concurrent_lock, and e2e
- Transaction code reduced from ~4000 lines to ~700 lines (82% reduction)
- Post-refactor fixes: TOCTOU race in lock acquisition, resource lock for
add_resource, path handling edge cases
Contribution
- I am willing to contribute to implementing this feature
Metadata
Metadata
Assignees
Labels
Type
Projects
Status