fix: preserve session on timeout instead of destroying it#163
fix: preserve session on timeout instead of destroying it#163gabmfranco-ds wants to merge 2 commits intoRichardAtCT:mainfrom
Conversation
Timeouts are transient — the Claude session is still valid on the CLI side. Previously, any error during resume (including timeout) would delete the session and start fresh, causing the bot to lose all conversation context. Now ClaudeTimeoutError is caught separately and the session is kept intact so auto-resume works on the next message. Also bumps default timeout from 300s to 1200s for long-running operations. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previously, any error during session resume (timeout, process crash,
connection issue) would destroy the session and start fresh. Now the
session is only destroyed when the error message explicitly says the
session is gone ("session not found", "invalid session", etc.).
All other errors preserve the session and touch last_used so
auto-resume works on the next user message.
Also updates .env CLAUDE_TIMEOUT_SECONDS from 600 to 1200 (the
constants.py default is overridden by the env var).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Good fix for a real bug — silent session destruction on timeout is nasty. A few things worth addressing before merge: 1. String matching is fragile (biggest concern) The PR title says "Catch from claude_agent_sdk import ClaudeTimeoutError, ClaudeSessionExpiredError
try:
...
except ClaudeSessionExpiredError:
# session is gone, rebuild
except ClaudeTimeoutError:
# transient, preserve
except Exception:
# unknown — preserve or reraise?If typed exceptions don't exist yet, that's worth a 2. The When The four combinations should be explicit:
Suggest adding a comment or separate branch for 3. Hoist to module level with a type annotation (mypy strict requires it): _SESSION_GONE_HINTS: tuple[str, ...] = (
"session not found",
"invalid session",
"session expired",
"no such session",
)4. The 300→1200s timeout bump 4× increase is a big jump. This will hold a Telegram handler open for up to 20 minutes. Questions:
Warrants a comment explaining the rationale, and ideally a per-request configurable timeout. 5. No tests No new tests for a regression fix. At minimum:
These are the exact two paths now split — they're the most important things to pin. Summary: The intent is correct and the bug is real. But the implementation diverges from the described approach (typed exception vs. string matching), has a logic gap in the — Friday, AI assistant to @RichardAtCT (posted as @RichardAtCT — FridayOpenClawBot access pending) |
Summary
ClaudeTimeoutErrorseparately from other resume errors. Timeouts are transient — the Claude session is still valid on the CLI side. The session is preserved so auto-resume works on the next message.Root cause
In
facade.py, theexcept Exceptionblock (line 92) caught all errors during session resume — including timeouts — and ran the "stale session" recovery path: delete session → create new → retry without history. ButClaudeTimeoutError ≠ session expired. The session is likely still alive on Claude's side; it just took too long to respond.Changes
src/claude/facade.pyClaudeTimeoutErrorbefore the genericExceptionhandler; preserve session and touchlast_usedto prevent expirationsrc/utils/constants.pyDEFAULT_CLAUDE_TIMEOUT_SECONDS: 300 → 1200Test plan
🤖 Generated with Claude Code