Skip to content

STAMP: Add post-training evaluation/prediction step to deploy test #270

@Ultimate-Storm

Description

@Ultimate-Storm

Context

The ODELIA deploy test (run_deploy_test.sh) has a full evaluate_model() step that:

  1. Collects checkpoint files from remote client workspaces
  2. Runs predict.py inside a Docker container to generate predictions
  3. Validates model quality by checking prediction output

The STAMP deploy test (run_stamp_deploy_test.sh) currently has no evaluation step — it only checks that training completes without errors ("Server runner finished.").

Proposed Solution

  1. Implement a STAMP equivalent of predict.py (or a stamp_predict.py) that loads a trained STAMP model checkpoint and runs inference on held-out test data
  2. Add collect_checkpoints() and evaluate_model() functions to run_stamp_deploy_test.sh
  3. Optionally verify AUROC or other classification metrics against a minimum threshold on synthetic data

Why This Matters

Without a prediction/evaluation step, the deploy test only validates that training runs — not that the resulting model is usable. This is a weaker guarantee than the ODELIA deploy test provides.

Related Files

  • scripts/deploy/run_stamp_deploy_test.sh — STAMP deploy test orchestrator (no evaluation step)
  • scripts/deploy/run_deploy_test.sh — ODELIA deploy test (has evaluate_model() and collect_checkpoints())
  • scripts/evaluation/predict.py — ODELIA prediction script (no STAMP equivalent exists)
  • application/jobs/STAMP_classification/app/custom/stamp_training.py — STAMP training code

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions