Summary
When a session fails healthcheck, the daemon restarts it immediately with no exponential backoff and no maximum retry count. If the failure is structural (e.g. missing binary), this creates an infinite restart loop that churns through pane IDs and log entries indefinitely.
Observed behavior
23:58:14 session demo/squad-agent-g1-0 running in pane %6
23:58:49 health: restarting session demo/squad-agent-g1-0 (failures=3, restarts=0)
23:58:49 session demo/squad-agent-g1-0 deleted
23:58:49 session demo/squad-agent-g1-0 running in pane %10
23:59:25 health: restarting session demo/squad-agent-g1-0 (failures=3, restarts=0)
... repeats forever ...
The restarts=0 counter never increments because each restart creates a new session object, resetting the counter.
Suggested improvements
- Exponential backoff: After each restart, increase the delay before the next attempt (e.g. 30s, 60s, 120s, 300s)
- Max restart limit: After N restarts (configurable in manifest), transition to a
failed state and stop retrying
- CrashLoopBackOff state: Similar to Kubernetes — when a session keeps failing, mark it as
CrashLoopBackOff so operators can see at a glance
- Preserve restart count across session replacement: Track restarts at the role/replica level, not the session level
Environment
- marvel 0.1.0-alpha.20260417.040512.b72898e, aarch64, Debian 13
Summary
When a session fails healthcheck, the daemon restarts it immediately with no exponential backoff and no maximum retry count. If the failure is structural (e.g. missing binary), this creates an infinite restart loop that churns through pane IDs and log entries indefinitely.
Observed behavior
The
restarts=0counter never increments because each restart creates a new session object, resetting the counter.Suggested improvements
failedstate and stop retryingCrashLoopBackOffso operators can see at a glanceEnvironment