Context
The ODELIA deploy test (run_deploy_test.sh) has a full evaluate_model() step that:
- Collects checkpoint files from remote client workspaces
- Runs
predict.py inside a Docker container to generate predictions
- Validates model quality by checking prediction output
The STAMP deploy test (run_stamp_deploy_test.sh) currently has no evaluation step — it only checks that training completes without errors ("Server runner finished.").
Proposed Solution
- Implement a STAMP equivalent of
predict.py (or a stamp_predict.py) that loads a trained STAMP model checkpoint and runs inference on held-out test data
- Add
collect_checkpoints() and evaluate_model() functions to run_stamp_deploy_test.sh
- Optionally verify AUROC or other classification metrics against a minimum threshold on synthetic data
Why This Matters
Without a prediction/evaluation step, the deploy test only validates that training runs — not that the resulting model is usable. This is a weaker guarantee than the ODELIA deploy test provides.
Related Files
scripts/deploy/run_stamp_deploy_test.sh — STAMP deploy test orchestrator (no evaluation step)
scripts/deploy/run_deploy_test.sh — ODELIA deploy test (has evaluate_model() and collect_checkpoints())
scripts/evaluation/predict.py — ODELIA prediction script (no STAMP equivalent exists)
application/jobs/STAMP_classification/app/custom/stamp_training.py — STAMP training code
Context
The ODELIA deploy test (
run_deploy_test.sh) has a fullevaluate_model()step that:predict.pyinside a Docker container to generate predictionsThe STAMP deploy test (
run_stamp_deploy_test.sh) currently has no evaluation step — it only checks that training completes without errors ("Server runner finished.").Proposed Solution
predict.py(or astamp_predict.py) that loads a trained STAMP model checkpoint and runs inference on held-out test datacollect_checkpoints()andevaluate_model()functions torun_stamp_deploy_test.shWhy This Matters
Without a prediction/evaluation step, the deploy test only validates that training runs — not that the resulting model is usable. This is a weaker guarantee than the ODELIA deploy test provides.
Related Files
scripts/deploy/run_stamp_deploy_test.sh— STAMP deploy test orchestrator (no evaluation step)scripts/deploy/run_deploy_test.sh— ODELIA deploy test (hasevaluate_model()andcollect_checkpoints())scripts/evaluation/predict.py— ODELIA prediction script (no STAMP equivalent exists)application/jobs/STAMP_classification/app/custom/stamp_training.py— STAMP training code