Skip to content

fix(acp): persist thread→session mapping to survive pod restarts#459

Open
JARVIS-coding-Agent wants to merge 4 commits intoopenabdev:mainfrom
JARVIS-coding-Agent:fix/persist-thread-session-mapping
Open

fix(acp): persist thread→session mapping to survive pod restarts#459
JARVIS-coding-Agent wants to merge 4 commits intoopenabdev:mainfrom
JARVIS-coding-Agent:fix/persist-thread-session-mapping

Conversation

@JARVIS-coding-Agent
Copy link
Copy Markdown
Contributor

Context

Closes #456

After a pod restart, all Discord thread-to-session mappings are lost because the suspended HashMap in SessionPool is purely in-memory. Even though kiro-cli correctly persists session state to PVC, OpenAB cannot associate returning threads with their previous sessions — every thread gets a brand-new session instead of resuming.

Summary

Persist the thread_id → session_id mapping (suspended HashMap) to a JSON file on PVC so it survives pod restarts. On startup, the mapping is reloaded and threads can resume their previous sessions via session/load.

Changes

  • Added mapping_path: PathBuf field to SessionPool, derived from config.working_dir
  • load_mapping(): reads thread_map.json on startup; falls back to empty map if file is missing or corrupt
  • save_mapping(): writes the suspended map to disk after every mutation (3 save points: get_or_create, cleanup_idle, shutdown)
  • shutdown() now persists active sessions into the mapping before clearing, so graceful restarts preserve all thread associations

Design Decisions

  • JSON file on PVC — simplest approach that matches the existing persistence model (kiro-cli already writes session JSON to PVC). No new dependencies needed (serde_json already in Cargo.toml).
  • Save on every mutation — ensures mapping is always up-to-date even on ungraceful shutdown. The write frequency is low (only on session eviction, idle cleanup, or shutdown) so disk I/O is negligible.
  • Graceful degradation — corrupt or missing thread_map.json logs a warning and starts fresh instead of crashing. Failed writes also warn without panicking.
  • No cleanup of stale entries — out of scope for this fix. Stale mappings pointing to deleted session files are harmless: session/load will fail and fall back to session/new.

Discord Discussion URL: https://discord.com/channels/1491295327620169908/1495245278205182093

@JARVIS-coding-Agent
Copy link
Copy Markdown
Contributor Author

Review Feedback 處理狀態

🟡 NIT — 全部修了

# Feedback 狀態
1 unwrap_or_default() 靜默寫空檔 ✅ 改成 match,序列化失敗 log warning + early return
2 &PathBuf&Path ✅ 改了
3 to_stringto_string_pretty ✅ 改了

🔴 SUGGESTED CHANGES

# Feedback 狀態 說明
1 Atomic write ✅ 改了 先寫 .json.tmpfs::rename,同 filesystem 上 rename 是 atomic
2 shutdown()try_lock() skip 問題 ✅ 改了 改用 .lock().await,shutdown 路徑確保所有 active session 都存到
3 Stale entry 清理 ❌ 不改 mapping 只存兩個短字串(thread_id → session_id),1000 條也才幾 KB。加 pruning 需要知道 kiro-cli session file 路徑格式,會增加跨模組耦合,不適合在這個 bug fix 裡做。可以開 follow-up issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Thread→Session mapping lost on pod restart (suspended HashMap not persisted)

2 participants