Skip to content

feat(health): gate connection info on container health check#39

Draft
shiftysheep wants to merge 6 commits intomasterfrom
feat/container-health-check
Draft

feat(health): gate connection info on container health check#39
shiftysheep wants to merge 6 commits intomasterfrom
feat/container-health-check

Conversation

@shiftysheep
Copy link
Copy Markdown
Owner

Summary

  • Adds a healthy column to DockerChallengeTracker; new containers start with healthy=False so connection info is withheld until Docker confirms they're running
  • Frontend shows a spinner during startup with rapid 3s polling, switching to normal 30s polling once healthy
  • Status API returns status: "starting" (no ports/host) or status: "running" (with connection info), backed by a batched Docker state fetch with 5s TTL cache
  • Fixes create_service returning 500 on name conflicts: recovers the existing service ID on Docker 409 (mirrors existing create_container behaviour), and fixes the underlying if not r: bug that was swallowing 4xx responses before status checks could fire
  • Adds Running / Starting health badge to the admin Docker Status page

Changes

Commit Change
5dba0f4 feat: health check gate — model column, migration, API, JS, HTML
9b6dbae fix: SQLAlchemy 1.4 migration (engine.begin())
8aa440a fix: treat created/not-found containers as starting, not dead
b615ce7 fix: recover existing service on 409 name conflict
823610d fix: if not r: swallowed 4xx before 409 check could fire
dcfa67d feat: Running/Starting badge on admin docker status page

Test plan

  • Launch a docker challenge — spinner appears while container starts, connection info appears once healthy
  • Relaunch within 5 min — "Container creation not allowed" (no duplicate)
  • Revert after 5 min — new container spins up with fresh healthy gate
  • Admin /admin/docker_status shows Running / Starting badge per container
  • Re-launch after tracker entry lost but service still on Docker — 409 is recovered gracefully, no 500
  • pytest tests/ -v — all 282 tests pass
  • pre-commit run --all-files — all checks pass

Ben Lucas added 6 commits March 2, 2026 17:41
Add a lazy health-check gate that delays showing connection info until
Docker confirms the container/service is accepting connections.

- Model: add `healthy` Boolean column to DockerChallengeTracker
  (default False for new inserts; server_default 1 preserves existing rows)
- Migration: _ensure_healthy_column() adds the column on plugin upgrade
- Functions: add get_container_states(), get_service_states(), and
  _CachedDockerState (5s TTL) for batched Docker API calls — O(1)
  calls per poll window regardless of user count
- API: DockerStatus.get() checks health via batched state lookup;
  returns status='starting' (no ports/host) or status='running' (with
  connection info); cleans up dead tracker entries
- Frontend: containerStarting state shows spinner with 3s rapid poll
  during startup; connection info appears only once status='running'
- Tests: 30 new tests covering state mapping, cache TTL, and API
  endpoint behaviour for starting/running/dead container scenarios
…t dead

Container state mapping was too aggressive:
- 'created' state (Docker container just started) mapped to 'stopped' → deletion
- '' (not found in states dict) also triggered deletion

Fixes:
- _fetch_container_states: map 'created'/'restarting' to 'starting'; only
  'exited'/'dead'/'removing'/'paused' map to 'stopped'
- _process_unhealthy_entry: treat '' (not found) as 'starting' — container
  may simply not be visible in Docker yet immediately after creation
- Tests updated + new test for not-found-shows-starting behaviour
When a service exists on Docker but the tracker entry is gone (e.g.
after CTFd restart with in-flight containers), creation returns 409.
Previously this silently returned (None, None) → 500 to the user.

Now mirrors the find_existing() pattern used by create_container():
on 409, query the existing service by name and return its ID and
actual port configuration so the tracker entry can be recreated.
…sponses

requests.Response.__bool__ returns False for any 4xx response, so
'if not r:' was short-circuiting the 409 conflict-recovery code before
it could fire. Both create_service and create_container were affected.

Change 'if not r:' to 'if r is None:' so only true network failures
(where do_request returns None) trigger the early return, while 409
responses correctly fall through to the name-conflict recovery path.
Adds a Status column to the admin docker status table showing
'Running' (green) or 'Starting' (yellow) based on the healthy flag.
No backend changes required — healthy is already in the tracker model.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant