Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
29 commits
Select commit Hold shift + click to select a range
bb53102
feat(storage): add transaction support with journal, undo, and crash …
qin-ctx Mar 5, 2026
ee4d29c
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 5, 2026
b9a51c2
test(transaction): add e2e rollback tests for mv and multi-step opera…
qin-ctx Mar 5, 2026
5a8ab10
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 5, 2026
d228ad5
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 10, 2026
b52ccbf
feat(storage): add transaction support with path locking and journal
qin-ctx Mar 13, 2026
9a8b3f5
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 15, 2026
0d10b1f
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 15, 2026
6cab58f
fix: tests
qin-ctx Mar 16, 2026
5a9ffb5
fix(transaction): fix rollback and race condition bugs
qin-ctx Mar 16, 2026
273efbc
refactor(transaction): make TransactionManager required and rewrite t…
qin-ctx Mar 16, 2026
0122a3b
refactor(transaction): make rollback fully async and unify session co…
qin-ctx Mar 16, 2026
74cba6b
fix: tests
qin-ctx Mar 16, 2026
cd2990b
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 16, 2026
c460ac6
refactor(transaction): simplify session commit and add redo-based cra…
qin-ctx Mar 16, 2026
e494f38
fix: transaction
qin-ctx Mar 16, 2026
d60e613
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 16, 2026
30e1856
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 16, 2026
c61978f
fix: UserIdentifier
qin-ctx Mar 17, 2026
34e3cae
Merge remote-tracking branch 'origin/main' into feature/transaction-s…
qin-ctx Mar 17, 2026
4e44a5d
refactor(transaction): replace undo-based transaction manager with li…
qin-ctx Mar 17, 2026
20d2cf3
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 17, 2026
fe33516
fix(transaction): remove checkpoint dead code, fix TOCTOU race, clari…
qin-ctx Mar 17, 2026
1010bd4
fix: path
qin-ctx Mar 17, 2026
715739d
fix: resource lock
qin-ctx Mar 17, 2026
5de24ab
fix: test
qin-ctx Mar 17, 2026
08c3673
docs: update
qin-ctx Mar 17, 2026
c43efae
Merge branch 'main' into feature/transaction-support
qin-ctx Mar 18, 2026
cdd222b
fix: tests
qin-ctx Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/design/multi-tenant-design.md
Original file line number Diff line number Diff line change
Expand Up @@ -283,7 +283,7 @@ def agent_space_name(self) -> str:
| `agent/instructions` | `/{account_id}/agent/{agent_space}/instructions/` | account + user + agent | agent 的行为规则,每用户独立 |
| `resources/` | `/{account_id}/resources/` | account | account 内共享的知识资源 |
| `session/` | `/{account_id}/session/{user_space}/{session_id}/` | account + user | 用户的对话记录 |
| `transactions/` | `/{account_id}/transactions/` | account | 账户级事务记录 |
| `redo/` | `/{account_id}/_system/redo/` | account | 崩溃恢复 redo 标记 |
| `_system/`(全局) | `/_system/` | 系统级 | 全局工作区列表 |
| `_system/`(per-account) | `/{account_id}/_system/` | account | 用户注册表 |

Expand Down
363 changes: 363 additions & 0 deletions docs/en/concepts/09-transaction.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,363 @@
# Path Locks and Crash Recovery

OpenViking uses two simple primitives — **path locks** and **redo log** — to protect the consistency of core write operations (`rm`, `mv`, `add_resource`, `session.commit`), ensuring that VikingFS, VectorDB, and QueueManager remain consistent even when failures occur.

## Design Philosophy

OpenViking is a context database where FS is the source of truth and VectorDB is a derived index. A lost index can be rebuilt from source data, but lost source data is unrecoverable. Therefore:

> **Better to miss a search result than to return a bad one.**

## Design Principles

1. **Write-exclusive**: Path locks ensure only one write operation can operate on a path at a time
2. **On by default**: All data operations automatically acquire locks; no extra configuration needed
3. **Lock as protection**: LockContext acquires locks on entry, releases on exit — no undo/journal/commit semantics
4. **Only session_memory needs crash recovery**: RedoLog re-executes memory extraction after a process crash
5. **Queue operations run outside locks**: SemanticQueue/EmbeddingQueue enqueue operations are idempotent and retriable

## Architecture

```
Service Layer (rm / mv / add_resource / session.commit)
|
v
+--[LockContext async context manager]--+
| |
| 1. Create LockHandle |
| 2. Acquire path lock (poll+timeout) |
| 3. Execute operations (FS+VectorDB) |
| 4. Release lock |
| |
| On exception: auto-release lock, |
| exception propagates unchanged |
+---------------------------------------+
|
v
Storage Layer (VikingFS, VectorDB, QueueManager)
```

## Two Core Components

### Component 1: PathLock + LockManager + LockContext (Path Lock System)

**PathLock** implements file-based distributed locks with two lock types — POINT and SUBTREE — using fencing tokens to prevent TOCTOU races and automatic stale lock detection and cleanup.

**LockHandle** is a lightweight lock holder token:

```python
@dataclass
class LockHandle:
id: str # Unique ID used to generate fencing tokens
locks: list[str] # Acquired lock file paths
created_at: float # Creation time
```

**LockManager** is a global singleton managing lock lifecycle:
- Creates/releases LockHandles
- Background cleanup of leaked locks (in-process safety net)
- Executes RedoLog recovery on startup

**LockContext** is an async context manager encapsulating the lock/unlock lifecycle:

```python
from openviking.storage.transaction import LockContext, get_lock_manager

async with LockContext(get_lock_manager(), [path], lock_mode="point") as handle:
# Perform operations under lock protection
...
# Lock automatically released on exit (including exceptions)
```

### Component 2: RedoLog (Crash Recovery)

Used only for the memory extraction phase of `session.commit`. Writes a marker before the operation, deletes it after success, and scans for leftover markers on startup to redo.

```
/local/_system/redo/{task_id}/redo.json
```

Memory extraction is idempotent — re-extracting from the same archive produces the same result.

## Consistency Issues and Solutions

### rm(uri)

| Problem | Solution |
|---------|----------|
| Delete file first, then index -> file gone but index remains -> search returns non-existent file | **Reverse order**: delete index first, then file. Index deletion failure -> both file and index intact |

**Locking strategy** (depends on target type):
- Deleting a **directory**: `lock_mode="subtree"`, locks the directory itself
- Deleting a **file**: `lock_mode="point"`, locks the file's parent directory

Operation flow:

```
1. Check whether target is a directory or file, choose lock mode
2. Acquire lock
3. Delete VectorDB index -> immediately invisible to search
4. Delete FS file
5. Release lock
```

VectorDB deletion fails -> exception thrown, lock auto-released, file and index both intact. FS deletion fails -> VectorDB already deleted but file remains, retry is safe.

### mv(old_uri, new_uri)

| Problem | Solution |
|---------|----------|
| File moved to new path but index points to old path -> search returns old path (doesn't exist) | Copy first then update index; clean up copy on failure |

**Locking strategy** (handled automatically via `lock_mode="mv"`):
- Moving a **directory**: SUBTREE lock on both source path and destination parent
- Moving a **file**: POINT lock on both source's parent and destination parent

Operation flow:

```
1. Check whether source is a directory or file, set src_is_dir
2. Acquire mv lock (internally chooses SUBTREE or POINT based on src_is_dir)
3. Copy to new location (source still intact, safe)
4. If directory, remove the lock file carried over by cp into the copy
5. Update VectorDB URIs
- Failure -> clean up copy, source and old index intact, consistent state
6. Delete source
7. Release lock
```

### add_resource

| Problem | Solution |
|---------|----------|
| File moved from temp to final directory, then crash -> file exists but never searchable | Two separate paths for first-time add vs incremental update |
| Resource already on disk but rm deletes it while semantic processing / vectorization is still running -> wasted work | Lifecycle SUBTREE lock held from finalization through processing completion |

**First-time add** (target does not exist) — handled in `ResourceProcessor.process_resource` Phase 3.5:

```
1. Acquire POINT lock on parent of final_uri
2. agfs.mv temp directory -> final location
3. Acquire SUBTREE lock on final_uri (inside POINT lock, eliminating race window)
4. Release POINT lock
5. Clean up temp directory
6. Enqueue SemanticMsg(lifecycle_lock_handle_id=...) -> DAG runs on final
7. DAG starts lock refresh loop (refreshes timestamp every lock_expire/2 seconds)
8. DAG complete + all embeddings done -> release SUBTREE lock
```

During this period, `rm` attempting to acquire a SUBTREE lock on the same path will fail with `ResourceBusyError`.

**Incremental update** (target already exists) — temp stays in place:

```
1. Acquire SUBTREE lock on target_uri (protect existing resource)
2. Enqueue SemanticMsg(uri=temp, target_uri=final, lifecycle_lock_handle_id=...)
3. DAG runs on temp, lock refresh loop active
4. DAG completion triggers sync_diff_callback or move_temp_to_target_callback
5. Callback completes -> release SUBTREE lock
```

Note: DAG callbacks do NOT wrap operations in an outer lock. Each `VikingFS.rm` and `VikingFS.mv` has its own lock internally. An outer lock would conflict with these inner locks causing deadlock.

**Server restart recovery**: SemanticMsg is persisted in QueueFS. On restart, `SemanticProcessor` detects that the `lifecycle_lock_handle_id` handle is missing from the in-memory LockManager and re-acquires a SUBTREE lock.

### session.commit()

| Problem | Solution |
|---------|----------|
| Messages cleared but archive not written -> conversation data lost | Phase 1 without lock (incomplete archive has no side effects) + Phase 2 with RedoLog |

LLM calls have unpredictable latency (5s~60s+) and cannot be inside a lock-holding operation. The design splits into two phases:

```
Phase 1 — Archive (no lock):
1. Generate archive summary (LLM)
2. Write archive (history/archive_N/messages.jsonl + summaries)
3. Clear messages.jsonl
4. Clear in-memory message list

Phase 2 — Memory extraction + write (RedoLog):
1. Write redo marker (archive_uri, session_uri, user identity)
2. Extract memories from archived messages (LLM)
3. Write current message state
4. Write relations
5. Directly enqueue SemanticQueue
6. Delete redo marker
```

**Crash recovery analysis**:

| Crash point | State | Recovery action |
|------------|-------|----------------|
| During Phase 1 archive write | No marker | Incomplete archive; next commit scans history/ for index, unaffected |
| Phase 1 archive complete but messages not cleared | No marker | Archive complete + messages still present = redundant but safe |
| During Phase 2 memory extraction/write | Redo marker exists | On startup: redo extraction + write + enqueue from archive |
| Phase 2 complete | Redo marker deleted | No recovery needed |

## LockContext

`LockContext` is an **async** context manager that encapsulates lock acquisition and release:

```python
from openviking.storage.transaction import LockContext, get_lock_manager

lock_manager = get_lock_manager()

# Point lock (write operations, semantic processing)
async with LockContext(lock_manager, [path], lock_mode="point"):
# Perform operations...
pass

# Subtree lock (delete operations)
async with LockContext(lock_manager, [path], lock_mode="subtree"):
# Perform operations...
pass

# MV lock (move operations)
async with LockContext(lock_manager, [src], lock_mode="mv", mv_dst_parent_path=dst):
# Perform operations...
pass
```

**Lock modes**:

| lock_mode | Use case | Behavior |
|-----------|----------|----------|
| `point` | Write operations, semantic processing | Lock the specified path; conflicts with any lock on the same path and any SUBTREE lock on ancestors |
| `subtree` | Delete operations | Lock the subtree root; conflicts with any lock on the same path, any lock on descendants, and any SUBTREE lock on ancestors |
| `mv` | Move operations | Directory move: SUBTREE lock on both source and destination; File move: POINT lock on source parent and destination (controlled by `src_is_dir`) |

**Exception handling**: `__aexit__` always releases locks and does not swallow exceptions. Lock acquisition failure raises `LockAcquisitionError`.

## Lock Types (POINT vs SUBTREE)

The lock mechanism uses two lock types to handle different conflict patterns:

| | POINT on same path | SUBTREE on same path | POINT on descendant | SUBTREE on ancestor |
|---|---|---|---|---|
| **POINT** | Conflict | Conflict | — | Conflict |
| **SUBTREE** | Conflict | Conflict | Conflict | Conflict |

- **POINT (P)**: Used for write and semantic-processing operations. Only locks a single directory. Blocks if any ancestor holds a SUBTREE lock.
- **SUBTREE (S)**: Used for rm and mv operations. Logically covers the entire subtree but only writes **one lock file** at the root. Before acquiring, scans all descendants and ancestor directories for conflicting locks.

## Lock Mechanism

### Lock Protocol

Lock file path: `{path}/.path.ovlock`

Lock file content (Fencing Token):
```
{handle_id}:{time_ns}:{lock_type}
```

Where `lock_type` is `P` (POINT) or `S` (SUBTREE).

### Lock Acquisition (POINT mode)

```
loop until timeout (poll interval: 200ms):
1. Check target directory exists
2. Check if target directory is locked by another operation
- Stale lock? -> remove and retry
- Active lock? -> wait
3. Check all ancestor directories for SUBTREE locks
- Stale lock? -> remove and retry
- Active lock? -> wait
4. Write POINT (P) lock file
5. TOCTOU double-check: re-scan ancestors for SUBTREE locks
- Conflict found: compare (timestamp, handle_id)
- Later one (larger timestamp/handle_id) backs off (removes own lock) to prevent livelock
- Wait and retry
6. Verify lock file ownership (fencing token matches)
7. Success

Timeout (default 0 = no-wait) raises LockAcquisitionError
```

### Lock Acquisition (SUBTREE mode)

```
loop until timeout (poll interval: 200ms):
1. Check target directory exists
2. Check if target directory is locked by another operation
- Stale lock? -> remove and retry
- Active lock? -> wait
3. Check all ancestor directories for SUBTREE locks
- Stale lock? -> remove and retry
- Active lock? -> wait
4. Scan all descendant directories for any locks by other operations
- Stale lock? -> remove and retry
- Active lock? -> wait
5. Write SUBTREE (S) lock file (only one file, at the root path)
6. TOCTOU double-check: re-scan descendants and ancestors
- Conflict found: compare (timestamp, handle_id)
- Later one (larger timestamp/handle_id) backs off (removes own lock) to prevent livelock
- Wait and retry
7. Verify lock file ownership (fencing token matches)
8. Success

Timeout (default 0 = no-wait) raises LockAcquisitionError
```

### Lock Expiry Cleanup

**Stale lock detection**: PathLock checks the fencing token timestamp. Locks older than `lock_expire` (default 300s) are considered stale and are removed automatically during acquisition.

**In-process cleanup**: LockManager checks active LockHandles every 60 seconds. Handles created more than 3600 seconds ago are force-released.

**Orphan locks**: Lock files left behind after a process crash are automatically removed via stale lock detection when any operation next attempts to acquire a lock on the same path.

## Crash Recovery

`LockManager.start()` automatically scans for leftover markers in `/local/_system/redo/` on startup:

| Scenario | Recovery action |
|----------|----------------|
| session_memory extraction crash | Redo memory extraction + write + enqueue from archive |
| Crash while holding lock | Lock file remains in AGFS; stale detection auto-cleans on next acquisition (default 300s expiry) |
| Crash after enqueue, before worker processes | QueueFS SQLite persistence; worker auto-pulls after restart |
| Orphan index | Cleaned on L2 on-demand load |

### Defense Summary

| Failure scenario | Defense | Recovery timing |
|-----------------|--------|-----------------|
| Crash during operation | Lock auto-expires + stale detection | Next acquisition of same path lock |
| Crash during add_resource semantic processing | Lifecycle lock expires + SemanticProcessor re-acquires on restart | Worker restart |
| Crash during session.commit Phase 2 | RedoLog marker + redo | On restart |
| Crash after enqueue, before worker | QueueFS SQLite persistence | Worker restart |
| Orphan index | L2 on-demand load cleanup | When user accesses |

## Configuration

Path locks are enabled by default with no extra configuration needed. **The default behavior is no-wait**: if the path is locked, `LockAcquisitionError` is raised immediately. To allow wait/retry, configure the `storage.transaction` section:

```json
{
"storage": {
"transaction": {
"lock_timeout": 5.0,
"lock_expire": 300.0
}
}
}
```

| Parameter | Type | Description | Default |
|-----------|------|-------------|---------|
| `lock_timeout` | float | Lock acquisition timeout (seconds). `0` = fail immediately if locked (default). `> 0` = wait/retry up to this many seconds. | `0.0` |
| `lock_expire` | float | Stale lock expiry threshold (seconds). Locks held longer than this by a crashed process are force-released. | `300.0` |

### QueueFS Persistence

The lock mechanism relies on QueueFS using the SQLite backend to ensure enqueued tasks survive process restarts. This is the default configuration and requires no manual setup.

## Related Documentation

- [Architecture](./01-architecture.md) - System architecture overview
- [Storage](./05-storage.md) - AGFS and vector store
- [Session Management](./08-session.md) - Session and memory management
- [Configuration](../guides/01-configuration.md) - Configuration reference
Loading
Loading