Skip to content

Add Duke benchmark pipeline and configurable deploy infrastructure#259

Merged
Ultimate-Storm merged 1 commit intomainfrom
feature/duke-benchmark-deploy
Apr 5, 2026
Merged

Add Duke benchmark pipeline and configurable deploy infrastructure#259
Ultimate-Storm merged 1 commit intomainfrom
feature/duke-benchmark-deploy

Conversation

@Ultimate-Storm
Copy link
Copy Markdown
Contributor

Summary

  • Add scripts/evaluation/run_duke_benchmark.sh (~298 lines) — automated end-to-end benchmarking on the Duke Breast MRI dataset:

    1. Build Docker image with startup kits
    2. Deploy to configured sites (dl0/dl2/dl3)
    3. Run swarm training (start server, clients, submit job)
    4. Collect swarm results (checkpoints, CSVs, run predict.py)
    5. Run local benchmark comparison (benchmark_models.py)
    6. Generate summary with timestamped result directories
    • Supports flags: --skip-build, --skip-deploy, --skip-swarm, --skip-local, --collect-only, --dry-run, --output-dir
  • Make deploy_and_test.sh infrastructure configurable:

    • SITES array read from deploy_sites.conf (was hardcoded (MHA RSH))
    • SERVER_NAME configurable (was hardcoded dl3.tud.de)
    • Container matching broadened to odelia|stamp|nvflare (was odelia only) in status/stop commands
    • Help text updated with configuration instructions
  • Add deploy_sites.conf.example — template with sanitized credentials and Duke benchmark dl0/dl2/dl3 templates

  • Add docs/DUKE_BENCHMARK_RESULTS.md — results template with infrastructure table, reproduction instructions, and placeholder comparison tables

  • Update scripts/evaluation/README.md with run_duke_benchmark.sh documentation

  • Add duke_results/ to .gitignore

Deploy infrastructure changes

# Before (hardcoded):
SITES=(MHA RSH)
local server_startup="$prod_dir/dl3.tud.de/startup"

# After (configurable via deploy_sites.conf):
if [[ -z "${SITES+x}" || ${#SITES[@]} -eq 0 ]]; then
    SITES=(MHA RSH)  # backward-compatible default
fi
local server_name="${SERVER_NAME:-dl3.tud.de}"

Test plan

  • deploy_and_test.sh works with default SITES (backward compatible)
  • deploy_and_test.sh works with custom SITES/SERVER_NAME from deploy_sites.conf
  • run_duke_benchmark.sh --dry-run shows planned steps without executing
  • deploy_sites.conf.example contains all required variables with documentation
  • Status/stop commands detect stamp and nvflare containers alongside odelia

🤖 Generated with Claude Code

Make deploy_and_test.sh read SITES and SERVER_NAME from deploy_sites.conf
instead of hardcoding them, enabling multi-site Duke benchmarks on dl0/dl2/dl3.
Add deploy_sites.conf.example with DL0/DL2/DL3 templates. Create
run_duke_benchmark.sh for end-to-end benchmark orchestration (build, deploy,
swarm train, collect results, local benchmark). Add results template at
docs/DUKE_BENCHMARK_RESULTS.md. Broaden container name matching to include
stamp and nvflare alongside odelia.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@Ultimate-Storm Ultimate-Storm merged commit f3cfdf8 into main Apr 5, 2026
6 checks passed
@Ultimate-Storm Ultimate-Storm deleted the feature/duke-benchmark-deploy branch April 5, 2026 14:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant