fix(daemon): guard session resume when agent runtime changes by bzqzheng · Pull Request #1905 · multica-ai/multica

bzqzheng · 2026-04-29T19:30:20Z

Summary

Guard issue and chat session resume pointers by comparing the prior session's runtime_id with the claiming task's runtime_id. When they differ, the claim endpoint returns an empty PriorSessionID so the daemon starts a fresh session instead of trying to resume a cross-runtime session.
Add chat_session.runtime_id column (nullable UUID → agent_runtime.id) so chat resume pointers remain self-sufficient instead of relying only on task-row history. Backfilled from the most recent completed/failed task per chat session; legacy rows with NULL runtime_id fall back to the task-row lookup.
Preserve work_dir reuse even when PriorSessionID is cleared — working directories are runtime-portable and should survive migrations.
Comparison uses task.RuntimeID (not agent.current_runtime_id) so old-runtime tasks can still resume old-runtime sessions during migration windows.

What changed

handler/daemon.go — ClaimTaskByRuntime: skip PriorSessionID when prior.RuntimeID != task.RuntimeID for issue tasks; for chat tasks, skip when chat_session.runtime_id is NULL or doesn't match, then fall back to task-row lookup with the same runtime guard.
service/task.go — CompleteTask / FailTask: persist runtime_id into chat_session on task completion so the pointer stays current.
Migration 060_chat_session_runtime_id — adds chat_session.runtime_id with FK to agent_runtime, backfills from latest completed/failed task per session.
SQL queries (agent.sql, chat.sql) — GetLastTaskSession and GetLastChatTaskSession now return runtime_id. CreateChatSession auto-populates runtime_id from the agent. UpdateChatSessionSession accepts runtime_id.

Test plan

go test ./internal/handler -run 'TestClaimTask_(IssuePriorSessionRuntimeGuard|ChatPriorSessionRuntimeGuard)$' — new tests for both issue and chat paths
go test ./internal/handler — all existing handler tests pass
go test ./... — clean

vercel · 2026-04-29T19:30:26Z

@bzqzheng is attempting to deploy a commit to the IndexLabs Team on Vercel.

A member of the Team first needs to authorize it.

multica-eve

Thanks for the fix. I found one blocker in the migration backfill.

server/migrations/060_chat_session_runtime_id.up.sql sets chat_session.runtime_id from the latest completed/failed task for the chat session, but it does not verify that this latest task is the task that produced the current chat_session.session_id. Existing rows can have a stale but non-null chat_session.session_id: the daemon pins agent_task_queue.session_id mid-run, and orphan recovery / retry paths can leave the task row with a newer session while the chat_session pointer still points at an older session. In that case this migration pairs the old chat_session session id with the newer task's runtime id. After migration, ClaimTaskByRuntime trusts cs.SessionID when cs.RuntimeID == task.RuntimeID, so it can still return a cross-runtime PriorSessionID, which is exactly the failure this PR is trying to prevent.

Please make the backfill preserve the session/runtime pairing, for example by selecting the latest task's session_id too and only setting runtime_id when latest.session_id = cs.session_id, or by deliberately updating both session_id and runtime_id from the same latest task if that is the desired data repair. I would also add a regression case for a stale chat_session.session_id plus a newer task-row session from a different runtime.

I also tried the focused handler tests locally, but my local test DB is stale and missing the new chat_session.runtime_id column, so the chat test failed at fixture setup rather than in the PR logic. CI backend is already green on the PR.

bzqzheng · 2026-04-30T12:57:28Z

@multica-eve all feedback addressed — migration now pairs latest.session_id = cs.session_id to prevent cross-runtime backfill, plus a regression test for the stale-session scenario. Ready for re-review. Thanks for catching this.

bzqzheng changed the title ~~Fix runtime session resume guard~~ fix(daemon): guard session resume when agent runtime changes Apr 29, 2026

bzqzheng force-pushed the fix/bri-66-runtime-session-guard branch 2 times, most recently from 2b76ddd to 2b7b39f Compare April 29, 2026 19:44

bzqzheng marked this pull request as ready for review April 29, 2026 19:50

multica-eve requested changes Apr 30, 2026

View reviewed changes

bzqzheng force-pushed the fix/bri-66-runtime-session-guard branch from 2b7b39f to fe86003 Compare April 30, 2026 12:31

bzqzheng requested a review from multica-eve April 30, 2026 12:59

fix: guard session resume by runtime

e7f92d2

bzqzheng force-pushed the fix/bri-66-runtime-session-guard branch from fe86003 to e7f92d2 Compare April 30, 2026 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(daemon): guard session resume when agent runtime changes#1905

fix(daemon): guard session resume when agent runtime changes#1905
bzqzheng wants to merge 1 commit intomultica-ai:mainfrom
bzqzheng:fix/bri-66-runtime-session-guard

bzqzheng commented Apr 29, 2026 •

edited

Loading

Uh oh!

vercel Bot commented Apr 29, 2026

Uh oh!

multica-eve left a comment

Uh oh!

bzqzheng commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bzqzheng commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What changed

Test plan

Uh oh!

vercel Bot commented Apr 29, 2026

Uh oh!

multica-eve left a comment

Choose a reason for hiding this comment

Uh oh!

bzqzheng commented Apr 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bzqzheng commented Apr 29, 2026 •

edited

Loading