Summary
After marvel delete workspace <name> followed by marvel work <same-manifest>, the team is re-created but no sessions ever spawn. No events are emitted in the 60+ second window observed.
Reproduce (deterministic on this cluster)
- Start fresh daemon:
marvel daemon --mrvl
- Apply crashloop manifest (any role that exits quickly works):
workspace:
name: crashloop
teams:
- name: flaky
roles:
- name: exit-fast
replicas: 1
runtime:
image: shell
command: sh
args: ["-c", "exit 1"]
restart_policy: always
healthcheck:
type: process-alive
timeout: "2s"
failure_threshold: 1
- Observe one
session.created + one session.crashed event. Session sticks in crashed state (separate question whether restart_policy=always should retry in this case, but that's not the primary bug).
marvel delete workspace crashloop — observe session.deleted event, workspace cleared.
marvel work <same-manifest> — workspace/team is reported ready.
- Wait 60 seconds.
Expected
A fresh session.created event, new session in get sessions output.
Actual
marvel get teams shows the team (replicas: 1).
marvel describe team crashloop/flaky shows Generation 1, CreatedAt correctly updated, Role config correct.
marvel get sessions returns empty.
marvel events shows zero new events since the apply. Only the pre-delete events remain visible.
Suspected cause
RoleHealth/backoff state keyed by workspace/team/role may survive workspace delete+recreate, leaving BackoffUntil set far in the future (especially if the previous generation hit saturation in noteCrashAndBackoff). The reconciler's gate then refuses to spawn. But this is a guess — the symptom is "team exists, zero replicas actual, no events, no progress."
Alternatively the Session manager or reconcile queue is stalling after cascade delete (#15) — check whether the cascade cleared everything it needed to.
Environment
- marvel 0.1.0-alpha.20260419.014522.b8f1f4b (commit b8f1f4b)
- Fresh daemon, Linux aarch64 Pi, tmux 3.5a
- mrvl:// (reproduces via Unix socket too)
Impact
Blocks testing of PR #21 / PR #24 crash-loop backoff telemetry — can't accumulate health.crashloop-backoff events if each workspace can only be applied once.
Workaround
Restart daemon between manifest applies.
Summary
After
marvel delete workspace <name>followed bymarvel work <same-manifest>, the team is re-created but no sessions ever spawn. No events are emitted in the 60+ second window observed.Reproduce (deterministic on this cluster)
marvel daemon --mrvlsession.created+ onesession.crashedevent. Session sticks incrashedstate (separate question whether restart_policy=always should retry in this case, but that's not the primary bug).marvel delete workspace crashloop— observesession.deletedevent, workspace cleared.marvel work <same-manifest>— workspace/team is reportedready.Expected
A fresh
session.createdevent, new session inget sessionsoutput.Actual
marvel get teamsshows the team (replicas: 1).marvel describe team crashloop/flakyshows Generation 1, CreatedAt correctly updated, Role config correct.marvel get sessionsreturns empty.marvel eventsshows zero new events since the apply. Only the pre-delete events remain visible.Suspected cause
RoleHealth/backoff state keyed by
workspace/team/rolemay survive workspace delete+recreate, leavingBackoffUntilset far in the future (especially if the previous generation hit saturation innoteCrashAndBackoff). The reconciler's gate then refuses to spawn. But this is a guess — the symptom is "team exists, zero replicas actual, no events, no progress."Alternatively the Session manager or reconcile queue is stalling after cascade delete (
#15) — check whether the cascade cleared everything it needed to.Environment
Impact
Blocks testing of PR #21 / PR #24 crash-loop backoff telemetry — can't accumulate
health.crashloop-backoffevents if each workspace can only be applied once.Workaround
Restart daemon between manifest applies.