Fix flaky CI swarm tests and self-hosted runner reliability by Ultimate-Storm · Pull Request #261 · KatherLab/MediSwarm

Ultimate-Storm · 2026-04-05T16:43:20Z

Summary

Replace fixed sleep 120 / sleep 3600 in swarm integration tests with polling loops that check for "Server runner finished." in the server log every 10s/30s, preventing premature assertion failures when the self-hosted runner is under load
Add concurrency group to pr-test.yaml so only one CI run executes on the self-hosted GPU runner at a time, preventing resource contention
Add docker system prune step at the start of each workflow run to reclaim disk space from stale containers/images and prevent "no space left on device" failures

Test plan

CI pipeline runs successfully on this PR (self-validating — the fixes apply to the workflow that tests this PR)
Verify concurrency control works by pushing two commits in quick succession — second run should cancel the first
Verify disk cleanup step runs before checkout (check "Reclaim disk space" step output in workflow logs)

🤖 Generated with Claude Code

…r image Docker COPY preserves symlinks, but NVFlare's os.walk()-based job signing and zip utilities do not follow symlinks. This caused job submission to fail with "job signature verification failed" because the custom/ symlink directories were empty in the signed zip. Resolve all symlinks to actual file/directory copies after COPY so os.walk() can traverse them. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…smatch The server-side PTFileModelPersistor creates models without class weights, so _class_weight is a plain attribute (absent from state_dict). Client-side training computes class weights from data, registering _class_weight as a buffer (present in state_dict). When the aggregated model is sent back, load_state_dict fails with "Missing key(s): _class_weight". Fix: pass loss_kwargs through models_config.py to all challenge factory functions, so both server and client models have consistent buffers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

models_config.py: Add fallback to /MediSwarm/pretrained_weights/ for challenge model checkpoints. The build script stores weights there to avoid bloating NVFlare job transfers, but the path resolution only looked inside the job folder where the weights don't exist. deploy_and_test.sh: Pass --model_name flag to docker.sh when starting clients so the correct challenge model is used instead of defaulting to MST. Add job_to_model_name() mapping function. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace fixed sleep durations with polling loops that check for "Server runner finished." in the server log, preventing premature assertion failures when the runner is under load. Add concurrency control so only one CI run uses the self-hosted GPU runner at a time, and prune stale Docker resources before each run to avoid "no space left on device" errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The _generateStartupKitArchives.sh script requires zip to create startup kit archives. The ODELIA Dockerfile includes zip/unzip but the STAMP Dockerfile was missing them, causing CI to fail with "zip: command not found" during the STAMP Docker build step. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ultimate-Storm and others added 5 commits April 5, 2026 16:42

Ultimate-Storm merged commit cd65d6b into main Apr 5, 2026
6 checks passed

Ultimate-Storm deleted the fix/ci-swarm-test-flakiness branch April 5, 2026 18:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix flaky CI swarm tests and self-hosted runner reliability#261

Fix flaky CI swarm tests and self-hosted runner reliability#261
Ultimate-Storm merged 5 commits intomainfrom
fix/ci-swarm-test-flakiness

Ultimate-Storm commented Apr 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ultimate-Storm commented Apr 5, 2026

Summary

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant