Skip to content

test: MaxRestarts saturation freezes role permanently until delete+re-apply #37

@arcaven

Description

@arcaven

What to test

PR #21 added the max_restarts cap. When the count is reached, noteCrashAndBackoff sets BackoffUntil = saturationFreezeUntil (far future) and the reconciler's gate refuses spawns forever. PR #33 fixed #29 so delete+re-apply clears that saturation state.

Test plan

  1. Apply:
workspace:
  name: saturation-test
teams:
  - name: doomed
    roles:
      - name: exit-fast
        replicas: 1
        max_restarts: 3
        runtime:
          command: sh
          args: ["-c", "exit 1"]
        restart_policy: always
  1. Watch marvel events until restart chore(deps): bump go.opentelemetry.io/otel/metric from 1.42.0 to 1.43.0 #3 (wall time: 30s + 60s + 120s ≈ 3.5 min).
  2. Wait an additional 10 minutes. No restart should fire.
  3. marvel describe team saturation-test/doomed — verify RoleHealth.BackoffUntil is far-future.
  4. marvel delete workspace saturation-test.
  5. Re-apply the same manifest.
  6. Expect a fresh session.created event within one reconcile tick, not a 10+ minute delay.

Pass

  • Exactly 3 restart attempts before saturation. No 4th spawn even after hours of idle.
  • describe team surfaces a far-future BackoffUntil.
  • After delete + re-apply, a new session spawns within one reconcile tick.
  • No stale SessionCrashed marker from the prior generation visible in get sessions.

Fail

Related: PR #21, PR #33.

Metadata

Metadata

Assignees

No one assigned

    Labels

    priority.p2Medium — should address this sprintscope.in-scopeAccepted, will work ontriage.completeTriage done, ready for worktype.testTest-only changes or a request to verify shipped behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions