fix(session): propagate extractor failures to async task error by dr3243636-ops · Pull Request #511 · volcengine/OpenViking

dr3243636-ops · 2026-03-10T07:39:36Z

Background

In async session commit (wait=false), extractor exceptions were swallowed by MemoryExtractor.extract() and converted to an empty list.

This caused task tracking to incorrectly report:

status=completed
memories_extracted=0
error=null

which made downstream monitoring (e.g. Slack alerting via task polling) unable to detect real extraction failures.

Changes

Added MemoryExtractor.extract_strict()
- same extraction logic as extract()
- raises RuntimeError("memory_extraction_failed: ...") on extraction errors
Extended SessionCompressor.extract_long_term_memories(..., strict_extract_errors=False)
- when enabled, it calls extract_strict()
Enabled strict mode in async commit path
- Session.commit_async() now passes strict_extract_errors=True
Added test coverage
- tests/test_session_task_tracking.py::test_task_failed_when_memory_extraction_raises
- verifies task transitions to failed and exposes error message

Validation

Unit/integration tests

Ran:

python -m pytest tests/test_task_tracker.py tests/test_session_task_tracking.py tests/test_session_async_commit.py -v

Result: 40 passed

Local production verification

Verified on local running OV service before opening this PR:

Normal model config -> async task completed
Forced invalid model (claude-haiku-4-5-20251001) -> async task failed with error containing memory_extraction_failed

This confirms failure signals now propagate to task API as expected.

Compatibility

Existing extract() behavior is unchanged (still non-throwing).
Strict error propagation is only enabled in async commit path.

Follow-up to volcengine#472. When `wait=false`, background commit failures were silently lost — callers had no way to know if memory extraction succeeded. This adds a lightweight in-memory TaskTracker that returns a `task_id` on async commit, which callers can poll via new `/tasks` endpoints to check completion status, results, or errors. Key changes: - New TaskTracker singleton with TTL-based cleanup (24h completed, 7d failed) - New API: GET /api/v1/tasks/{task_id} and GET /api/v1/tasks (with filters) - Atomic duplicate commit detection (eliminates race condition) - Error message sanitization (keys/tokens redacted) - Defensive copies on all public reads (thread safety) - 35 tests (26 unit + 9 integration), all existing tests pass Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

CLAassistant · 2026-03-10T09:00:13Z

All committers have signed the CLA.

qin-ctx · 2026-03-10T09:15:38Z

openviking/session/memory_extractor.py

+
    async def create_memory(
        self,
        candidate: CandidateMemory,


[Bug] extract_strict() 是 extract() 的完整复制粘贴（~115 行），仅在 except 块中有行为差异（raise RuntimeError vs return []）。

这意味着后续对 extract() 的任何 bug 修复或功能变更都必须同步到 extract_strict()，否则两者会产生行为分歧（silent divergence）。

建议通过在 extract() 中添加一个 strict 参数来消除重复：

async def extract( self, context, user, session_id, *, strict: bool = False ) -> List[CandidateMemory]: # ... existing logic ... except Exception as e: logger.error(f"Memory extraction failed: {e}") if strict: raise RuntimeError(f"memory_extraction_failed: {e}") from e return []

然后 extract_strict 可以只做一层委托，或直接在 compressor 中传 strict=True。

qin-ctx · 2026-03-10T09:15:38Z

openviking/session/compressor.py

            return []

-        candidates = await self.extractor.extract(context, user, session_id)
+        if strict_extract_errors:


[Suggestion] strict_extract_errors=True 时异常会直接从 extract_long_term_memories 抛出，调用方 commit_async()（session.py:338）没有 try/catch，而是依赖外层 task tracker 来捕获。

这个假设本身是合理的，但建议至少在此处或 commit_async 中加一行注释说明 "extraction errors intentionally propagate to caller (task tracker)"，避免后续维护者误以为遗漏了异常处理而加上 try/catch 吞掉错误。

dr3243636-ops · 2026-03-10T09:35:05Z

Addressed review feedback in 39c036d:

Removed duplicated extraction logic by adding strict flag to MemoryExtractor.extract(...) and making extract_strict(...) a thin compatibility wrapper.
Added explicit comment in SessionCompressor.extract_long_term_memories clarifying that strict extraction errors are intentionally propagated to task tracker.
Ran formatter + regression tests: tests/test_task_tracker.py, tests/test_session_task_tracking.py, tests/test_session_async_commit.py (40 passed).

Thanks for the review.

dr3243636-ops and others added 3 commits March 8, 2026 13:27

fix: resolve CI lint failures (ruff format + unused imports)

798a7d1

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(session): propagate extraction failures to async task error

645bcc8

github-project-automation bot added this to OpenViking project Mar 10, 2026

github-project-automation bot moved this to Backlog in OpenViking project Mar 10, 2026

Merge branch 'main' into fix/session-commit-propagate-extraction-errors

769b31f

qin-ctx self-assigned this Mar 10, 2026

qin-ctx reviewed Mar 10, 2026

View reviewed changes

refactor(session): dedupe strict extraction path

39c036d

qin-ctx approved these changes Mar 12, 2026

View reviewed changes

qin-ctx merged commit afccbae into volcengine:main Mar 12, 2026
5 of 6 checks passed

github-project-automation bot moved this from Backlog to Done in OpenViking project Mar 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(session): propagate extractor failures to async task error#511

fix(session): propagate extractor failures to async task error#511
qin-ctx merged 5 commits intovolcengine:mainfrom
dr3243636-ops:fix/session-commit-propagate-extraction-errors

dr3243636-ops commented Mar 10, 2026 •

edited

Loading

Uh oh!

CLAassistant commented Mar 10, 2026 •

edited

Loading

Uh oh!

qin-ctx Mar 10, 2026

Uh oh!

qin-ctx Mar 10, 2026

Uh oh!

dr3243636-ops commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

dr3243636-ops commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Background

Changes

Validation

Unit/integration tests

Local production verification

Compatibility

Uh oh!

CLAassistant commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qin-ctx Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

qin-ctx Mar 10, 2026

Choose a reason for hiding this comment

Uh oh!

dr3243636-ops commented Mar 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

dr3243636-ops commented Mar 10, 2026 •

edited

Loading

CLAassistant commented Mar 10, 2026 •

edited

Loading