fix: clear live messages.jsonl during async commit to prevent duplicate commits#585
Conversation
…te commits When commit_async() archives messages to a temp directory, the live session's messages.jsonl remains untouched until SemanticProcessor completes the atomic temp→live switch. If the session is evicted from memory and reloaded from disk before that switch (or if the process restarts), the old messages.jsonl is read back — causing a subsequent commit to re-archive the same messages. Fix: write an empty messages.jsonl to the live session directory immediately after Phase 2 (temp write) completes and before Phase 3 (semantic enqueue). This closes the race window: even if the session is reloaded before the semantic switch, it finds no messages to re-commit. Fixes volcengine#580
|
|
|
Hi, thanks for the detailed writeup and the fix! After digging into the current codebase, I think this PR may no longer be needed. PR #584 reverted the COW pattern that was introduced in #535, which is the root cause of the race condition described in #580. The # 2.5 Update active_count context this diff targets was removed as part of that revert and no longer exists in main. In the current commit_async(), the race window no longer exists — after self._messages.clear() (session.py:328), _write_to_agfs_async(self._messages) is called immediately (session.py:349), which writes empty content directly to the live messages.jsonl via a synchronous write_file call (viking_fs.py:1268), no async queue involved. |
Summary
Fixes #580 — async session commit can re-commit old messages before live session switch completes.
Root cause
commit_async()uses a Copy-on-Write pattern:self._messagesin memorySemanticProcessorfor atomic temp→live directory switchThe problem: after Phase 2 clears the in-memory message list, the live session directory still contains the original
messages.jsonl. The atomic switch only happens later whenSemanticProcessorprocesses the queue. If the session object is evicted from memory and reloaded from disk before that switch (or if the process restarts), the oldmessages.jsonlis read back — causing a subsequent commit to re-archive the same messages.Fix
Write an empty
messages.jsonlto the live session directory immediately after Phase 2 completes (step 2.6), before enqueuing to the semantic queue. This closes the race window:self._messagesis already cleared (step 2.1)messages.jsonlis now also cleared (step 2.6)The fix is intentionally minimal — one
write_filecall + logging. It uses the existing_viking_fs.write_file()API that is already used during session initialization (line 151).What about the task completion semantics?
The issue also mentions that
task.status=completedshould imply the live session has been switched. This PR addresses the data safety concern (no duplicate commits) but does not change the task tracking semantics. A follow-up could defertracker.complete()until the semantic switch finishes, if stricter task completion guarantees are desired.