Skip to content

STAMP: Validate federated model quality against local-only baseline #275

@Ultimate-Storm

Description

@Ultimate-Storm

Context

The STAMP deploy test currently validates that swarm training completes successfully, but does not compare the federated model quality against a local-only training baseline.

In the initial deploy test run (2 rounds, VIT, synthetic data), both clients achieved AUROC 1.0 — but this is on synthetic data with easy-to-separate classes and is not a meaningful quality benchmark.

Proposed Solution

  1. Automated local baseline: Before swarm training, run `--local_training` on each client's data and record final metrics (val_loss, val_auroc)
  2. Comparison: After swarm training completes, compare federated metrics against local baselines
  3. Reporting: Output a comparison table in the deploy test results JSON:
    ```json
    {
    "federated_auroc": 0.85,
    "local_auroc_site_1": 0.78,
    "local_auroc_site_2": 0.72,
    "improvement": "+9.7%"
    }
    ```
  4. Optionally fail the test if the federated model is significantly worse than the best local model (regression check)

Why This Matters

The core value proposition of swarm learning is that the federated model should perform at least as well as (ideally better than) any single site's local model. Without this check, we can't validate that model aggregation is actually beneficial.

Related Files

  • `scripts/deploy/run_stamp_deploy_test.sh` — deploy test orchestrator
  • `docker_config/master_template_STAMP.yml` — has `--local_training` flag
  • `application/jobs/STAMP_classification/app/custom/main.py` — supports `local_training` mode

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions