STAMP: Validate federated model quality against local-only baseline

## Context

The STAMP deploy test currently validates that swarm training **completes** successfully, but does not compare the federated model quality against a local-only training baseline.

In the initial deploy test run (2 rounds, VIT, synthetic data), both clients achieved AUROC 1.0 — but this is on synthetic data with easy-to-separate classes and is not a meaningful quality benchmark.

## Proposed Solution

1. **Automated local baseline**: Before swarm training, run \`--local_training\` on each client's data and record final metrics (val_loss, val_auroc)
2. **Comparison**: After swarm training completes, compare federated metrics against local baselines
3. **Reporting**: Output a comparison table in the deploy test results JSON:
   \`\`\`json
   {
     "federated_auroc": 0.85,
     "local_auroc_site_1": 0.78,
     "local_auroc_site_2": 0.72,
     "improvement": "+9.7%"
   }
   \`\`\`
4. Optionally fail the test if the federated model is significantly **worse** than the best local model (regression check)

## Why This Matters

The core value proposition of swarm learning is that the federated model should perform at least as well as (ideally better than) any single site's local model. Without this check, we can't validate that model aggregation is actually beneficial.

## Related Files

- \`scripts/deploy/run_stamp_deploy_test.sh\` — deploy test orchestrator
- \`docker_config/master_template_STAMP.yml\` — has \`--local_training\` flag
- \`application/jobs/STAMP_classification/app/custom/main.py\` — supports \`local_training\` mode

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAMP: Validate federated model quality against local-only baseline #275

Context

Proposed Solution

Why This Matters

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

STAMP: Validate federated model quality against local-only baseline #275

Description

Context

Proposed Solution

Why This Matters

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions