STAMP: Add post-training evaluation/prediction step to deploy test

## Context

The ODELIA deploy test (`run_deploy_test.sh`) has a full `evaluate_model()` step that:
1. Collects checkpoint files from remote client workspaces
2. Runs `predict.py` inside a Docker container to generate predictions
3. Validates model quality by checking prediction output

The STAMP deploy test (`run_stamp_deploy_test.sh`) currently has **no evaluation step** — it only checks that training completes without errors ("Server runner finished.").

## Proposed Solution

1. Implement a STAMP equivalent of `predict.py` (or a `stamp_predict.py`) that loads a trained STAMP model checkpoint and runs inference on held-out test data
2. Add `collect_checkpoints()` and `evaluate_model()` functions to `run_stamp_deploy_test.sh`
3. Optionally verify AUROC or other classification metrics against a minimum threshold on synthetic data

## Why This Matters

Without a prediction/evaluation step, the deploy test only validates that training **runs** — not that the resulting model is **usable**. This is a weaker guarantee than the ODELIA deploy test provides.

## Related Files

- `scripts/deploy/run_stamp_deploy_test.sh` — STAMP deploy test orchestrator (no evaluation step)
- `scripts/deploy/run_deploy_test.sh` — ODELIA deploy test (has `evaluate_model()` and `collect_checkpoints()`)
- `scripts/evaluation/predict.py` — ODELIA prediction script (no STAMP equivalent exists)
- `application/jobs/STAMP_classification/app/custom/stamp_training.py` — STAMP training code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

STAMP: Add post-training evaluation/prediction step to deploy test #270

Context

Proposed Solution

Why This Matters

Related Files

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

STAMP: Add post-training evaluation/prediction step to deploy test #270

Description

Context

Proposed Solution

Why This Matters

Related Files

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions