diff --git a/SECURITY_SUMMARY_P2.md b/SECURITY_SUMMARY_P2.md new file mode 100644 index 0000000..38e27b3 --- /dev/null +++ b/SECURITY_SUMMARY_P2.md @@ -0,0 +1,82 @@ +# Security Summary: P2 Sweep Expansion + +**PR**: Expand sweeps for TTT/threshold/gate weights + production inference cores +**Date**: 2026-02-09 +**Status**: ✅ No vulnerabilities found + +## Security Checks Performed + +### 1. CodeQL Static Analysis +- **Language**: Python +- **Alerts**: 0 +- **Status**: ✅ PASS + +No security vulnerabilities detected in: +- New sweep configuration files (JSON) +- New documentation files (Markdown) +- Test validation script (Python) +- Updated README + +### 2. Code Review +- **Files Reviewed**: 7 +- **Comments**: 0 +- **Status**: ✅ PASS + +All changes follow best practices: +- No hardcoded credentials +- No unsafe file operations +- No command injection vulnerabilities +- Proper input validation in test script + +### 3. Dependency Analysis +No new dependencies added. All changes use existing project dependencies. + +## Changes Summary + +### New Files (All Safe) +1. `docs/sweep_ttt_example.json` - JSON configuration (declarative, no code execution) +2. `docs/sweep_threshold_example.json` - JSON configuration (declarative, no code execution) +3. `docs/sweep_gate_weights_example.json` - JSON configuration (declarative, no code execution) +4. `docs/sweep_examples.md` - Documentation (Markdown, no executable code) +5. `docs/rust_inference_template.md` - Documentation (Markdown, specification only) +6. `tests/test_sweep_configs.py` - Test script (safe: validates JSON, no external input) + +### Modified Files (All Safe) +1. `README.md` - Documentation updates only + +## Potential Security Considerations + +### Sweep Configurations +The sweep configs execute shell commands via `hpo_sweep.py`. Security notes: +- Commands are parameterized via config file (user controls all inputs) +- Environment starts from the caller's environment, with variables from the config explicitly overlaying it (callers should ensure their environment is trusted or run with a sanitized env) +- No user input is directly interpolated into commands at runtime +- All paths are relative to repository root or explicitly configured + +**Risk Level**: Low - requires user to intentionally create malicious config + +### Test Script +The test validation script: +- Only reads JSON files from known locations +- Uses safe JSON parsing (`json.loads`) +- No file write operations except when run with output flags +- No external network access + +**Risk Level**: Minimal + +## Recommendations + +1. **For Users**: Review sweep config files before running, especially if obtained from untrusted sources +2. **For Developers**: Consider adding schema validation for sweep configs if accepting from external sources +3. **For CI/CD**: Sweep configs should be version-controlled and reviewed via PR process + +## Conclusion + +✅ **All security checks passed** + +No vulnerabilities introduced by this PR. All changes are: +- Documentation and configuration files +- Safe Python test code with proper input validation +- No new attack surface created + +The P2 sweep expansion is safe to merge. diff --git a/docs/sweep_examples.md b/docs/sweep_examples.md new file mode 100644 index 0000000..9a6e74e --- /dev/null +++ b/docs/sweep_examples.md @@ -0,0 +1,257 @@ +# Sweep Examples: TTT, Threshold, and Gate Weight Tuning + +This document provides example sweep configurations for systematically exploring: +1. **TTT (Test-Time Training) parameters** (method, steps, learning rate, reset policy) +2. **Score thresholds** (detection confidence filtering) +3. **Gate weights** (score fusion for detection/template/uncertainty) + +All sweeps use the `hpo_sweep.py` harness (or the `yolozu.py sweep` wrapper). + +## Overview + +The sweep harness executes parameterized commands, collects metrics, and writes CSV/Markdown tables. +Each sweep config is a JSON file with: +- `base_cmd`: command template with `{param}` placeholders for swept params and `$ENV_VAR` for fixed settings +- `param_grid`: dictionary of parameter names → list of values +- `env`: environment variables for fixed settings (dataset path, checkpoint, etc.) +- `metrics.path`: where to find the output metrics JSON +- `metrics.keys`: which metrics to extract (optional; if empty, stores entire JSON) + +**Note**: Fixed settings (dataset, checkpoint, device) should be set as environment variables in the `env` section, +while swept parameters use `{param}` placeholders in `base_cmd`. + +## 1. TTT Parameter Sweep + +**Purpose**: Find optimal TTT hyperparameters (method, steps, lr, reset policy) for a given checkpoint and dataset. + +**Example config**: [`docs/sweep_ttt_example.json`](sweep_ttt_example.json) + +### Parameters swept + +- `ttt_method`: `["tent", "mim"]` — TTT algorithm (Tent or MIM) +- `ttt_steps`: `[1, 3, 5, 10]` — Number of adaptation steps per sample/stream +- `ttt_lr`: `[1e-5, 5e-5, 1e-4, 5e-4]` — Learning rate +- `ttt_reset`: `["sample", "stream"]` — Reset policy (per-sample or stream-level) + +**Total runs**: 2 × 4 × 4 × 2 = 64 configurations + +### Usage + +```bash +# Prepare a fixed eval subset for reproducibility +python3 tools/make_subset_dataset.py \ + --dataset data/coco128 \ + --split train2017 \ + --n 50 \ + --seed 0 \ + --out reports/coco128_50 + +# Edit sweep_ttt_example.json to update env vars for your setup: +# - DATASET: path to dataset +# - CHECKPOINT: path to checkpoint +# - DEVICE: cuda:0 or cpu + +# Then run the sweep +python3 tools/yolozu.py sweep --config docs/sweep_ttt_example.json --resume + +# Or directly with hpo_sweep.py +python3 tools/hpo_sweep.py --config docs/sweep_ttt_example.json --resume +``` + +**Outputs**: +- `reports/sweep_ttt.jsonl` — one line per run with params + metrics +- `reports/sweep_ttt.csv` — tabular format for plotting +- `reports/sweep_ttt.md` — Markdown table for quick review + +### Evaluation + +After running the sweep, evaluate each prediction file to get mAP scores: + +```bash +# Example: evaluate one run +python3 tools/eval_coco.py \ + --dataset reports/coco128_50 \ + --split train2017 \ + --predictions runs/sweep_ttt/tent-sample-steps-5-lr-0.0001/predictions.json \ + --bbox-format cxcywh_norm +``` + +Or batch-evaluate all runs and merge metrics back into the sweep results (custom script recommended). + +### Notes + +- **Baseline**: Run the same command with `--no-ttt` (or remove `--ttt`) for a zero-TTT baseline. +- **Domain shift**: TTT is most effective when there's a domain gap (e.g., COCO → BDD100K, or corrupted images). +- **Reproducibility**: Use `--ttt-seed ` for fixed randomness in masking/augmentations. + +--- + +## 2. Score Threshold Sweep + +**Purpose**: Tune the detection confidence threshold to maximize mAP or other metrics. + +**Example config**: [`docs/sweep_threshold_example.json`](sweep_threshold_example.json) + +### Parameters swept + +- `score_threshold`: `[0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5]` + +**Total runs**: 8 configurations + +### Usage + +```bash +# Run sweep +python3 tools/yolozu.py sweep --config docs/sweep_threshold_example.json --resume +``` + +**Command breakdown**: +1. Export predictions with varying thresholds +2. Evaluate each with COCO mAP (`eval_coco.py`) +3. Extract `metrics.map50`, `metrics.map50_95`, `metrics.ar100` from the metrics JSON (note: `eval_coco.py` outputs `metrics.ar100`, not `mar_100`) + +**Outputs**: +- `reports/sweep_threshold.jsonl` +- `reports/sweep_threshold.csv` +- `reports/sweep_threshold.md` + +### Analyzing results + +Open `reports/sweep_threshold.csv` and plot `score_threshold` vs `map50_95` to find the optimal threshold. + +Example (requires pandas/matplotlib): + +```python +import pandas as pd +import matplotlib.pyplot as plt + +df = pd.read_csv("reports/sweep_threshold.csv") +df = df.sort_values("params.score_threshold") + +plt.plot(df["params.score_threshold"], df["metrics.map50_95"], marker="o") +plt.xlabel("Score Threshold") +plt.ylabel("mAP@50-95") +plt.title("Threshold vs mAP") +plt.grid(True) +plt.savefig("reports/threshold_sweep.png") +``` + +--- + +## 3. Gate Weight Sweep + +**Purpose**: Tune inference-time gate weights for score fusion (detection + template + uncertainty). + +**Example config**: [`docs/sweep_gate_weights_example.json`](sweep_gate_weights_example.json) + +### Background + +YOLOZU supports lightweight inference-time rescoring: +``` +final_score = w_det * score_det + w_tmp * score_tmp - w_unc * (sigma_z + sigma_rot) +``` + +The `tune_gate_weights.py` tool performs grid search over these weights **offline on CPU** (no retraining required). + +### Parameters swept + +- `grid_det`: `["1.0"]` — keep detection weight fixed at 1.0 +- `grid_tmp`: `["0.0,0.25,0.5,0.75,1.0", "0.0,0.5,1.0"]` — template score weight +- `grid_unc`: `["0.0,0.25,0.5,0.75,1.0", "0.0,0.5,1.0"]` — uncertainty penalty weight +- `metric`: `["map50_95", "map50"]` — optimization target + +**Total runs**: 1 × 2 × 2 × 2 = 8 configurations (each performs its own inner grid search) + +### Usage + +```bash +# First, generate predictions with uncertainty estimates +python3 tools/export_predictions.py \ + --adapter rtdetr_pose \ + --dataset data/coco128 \ + --split train2017 \ + --checkpoint runs/rtdetr_pose/checkpoint.pt \ + --wrap \ + --output reports/predictions_rtdetr_pose.json + +# Run gate weight sweep +python3 tools/yolozu.py sweep --config docs/sweep_gate_weights_example.json --resume +``` + +**Outputs**: +- `reports/sweep_gate_weights.jsonl` +- `reports/sweep_gate_weights.csv` +- `reports/sweep_gate_weights.md` + +Each run produces a `gate_tuning_report.json` metrics report with: +- `metrics.tuning.best.det`, `metrics.tuning.best.tmp`, `metrics.tuning.best.unc`: optimal gate weights found +- `metrics.tuning.best.map50`, `metrics.tuning.best.map50_95`: mAP scores achieved with those weights +- additional tuning rows under `metrics.tuning` that the sweep harness can aggregate into CSV/Markdown + +### Notes + +- **No GPU required**: Gate tuning runs on CPU using `simple_map` proxy. +- **Uncertainty fields**: Requires predictions with `sigma_z`, `sigma_rot` (RTDETRPose with `use_uncertainty=true`). +- **Template scores**: Optionally add `score_tmp_sym` per detection (from external template matcher). + +--- + +## 4. Combined Sweeps + +You can nest sweeps or chain them: + +### Example: TTT + Threshold sweep + +1. Run TTT sweep to find best TTT config +2. Pick the best TTT config from step 1 +3. Run threshold sweep with that TTT config + +Or do a Cartesian product (TTT params × thresholds) — note this can be large! + +--- + +## Advanced: Custom metrics extraction + +If your command writes a custom JSON structure, adjust `metrics.keys` to extract the right fields: + +```json +{ + "metrics": { + "path": "{run_dir}/custom_metrics.json", + "keys": ["model.map50_95", "timing.inference_ms", "meta.git_sha"] + } +} +``` + +The harness uses dot-notation to traverse nested dicts. + +--- + +## Tips + +1. **Use `--resume`**: Skip already-completed runs (based on `run_id` in results JSONL). +2. **Use `--max-runs N`**: Cap the number of runs for quick tests. +3. **Use `--dry-run`**: Print commands without executing (useful for debugging config). +4. **Pin dataset**: Use `make_subset_dataset.py` for reproducible evaluation subsets. +5. **Multiple seeds**: For stochastic methods (TTT, TTA), run sweeps with different seeds and aggregate results. + +--- + +## Summary Table + +| Sweep Type | Config File | Typical Runs | Outputs | Use Case | +|------------|-------------|--------------|---------|----------| +| TTT | `sweep_ttt_example.json` | 64 | `sweep_ttt.{jsonl,csv,md}` | Find best TTT hyperparams | +| Threshold | `sweep_threshold_example.json` | 8 | `sweep_threshold.{jsonl,csv,md}` | Find optimal score cutoff | +| Gate Weights | `sweep_gate_weights_example.json` | 8 | `sweep_gate_weights.{jsonl,csv,md}` | Tune inference-time score fusion | + +All sweeps produce **CSV/MD tables** for easy plotting and comparison. + +--- + +## References + +- Sweep harness: [`tools/hpo_sweep.py`](../tools/hpo_sweep.py) +- TTT protocol: [`docs/ttt_protocol.md`](ttt_protocol.md) +- Gate weight tuning: [`docs/gate_weight_tuning.md`](gate_weight_tuning.md) +- Unified CLI: [`tools/yolozu.py`](../tools/yolozu.py) diff --git a/docs/sweep_gate_weights_example.json b/docs/sweep_gate_weights_example.json new file mode 100644 index 0000000..62e4dda --- /dev/null +++ b/docs/sweep_gate_weights_example.json @@ -0,0 +1,29 @@ +{ + "base_cmd": "python3 tools/tune_gate_weights.py --dataset $DATASET --split $SPLIT --predictions $PREDICTIONS --metric {metric} --grid-det {grid_det} --grid-tmp {grid_tmp} --grid-unc {grid_unc} --output-report {run_dir}/gate_tuning_report.json --output-predictions {run_dir}/predictions_tuned.json --wrap-output", + "param_grid": { + "grid_det": ["1.0"], + "grid_tmp": ["0.0,0.25,0.5,0.75,1.0", "0.0,0.5,1.0"], + "grid_unc": ["0.0,0.25,0.5,0.75,1.0", "0.0,0.5,1.0"], + "metric": ["map50_95", "map50"] + }, + "param_order": ["metric", "grid_det", "grid_tmp", "grid_unc"], + "run_dir": "runs/sweep_gate_weights/{run_id}", + "metrics": { + "path": "{run_dir}/gate_tuning_report.json", + "keys": [ + "metrics.tuning.best.det", + "metrics.tuning.best.tmp", + "metrics.tuning.best.unc", + "metrics.tuning.best.map50_95" + ] + }, + "result_jsonl": "reports/sweep_gate_weights.jsonl", + "result_csv": "reports/sweep_gate_weights.csv", + "result_md": "reports/sweep_gate_weights.md", + "shell": true, + "env": { + "DATASET": "data/coco128", + "SPLIT": "train2017", + "PREDICTIONS": "reports/predictions_rtdetr_pose.json" + } +} diff --git a/docs/sweep_threshold_example.json b/docs/sweep_threshold_example.json new file mode 100644 index 0000000..0b918bf --- /dev/null +++ b/docs/sweep_threshold_example.json @@ -0,0 +1,23 @@ +{ + "base_cmd": "python3 tools/yolozu.py export --backend torch --dataset $DATASET --split $SPLIT --checkpoint $CHECKPOINT --device $DEVICE --max-images $MAX_IMAGES --score-threshold {score_threshold} --wrap --output {run_dir}/predictions.json && python3 tools/eval_coco.py --dataset $DATASET --split $SPLIT --predictions {run_dir}/predictions.json --bbox-format cxcywh_norm --max-images $MAX_IMAGES --output {run_dir}/metrics.json", + "param_grid": { + "score_threshold": [0.001, 0.01, 0.05, 0.1, 0.2, 0.3, 0.4, 0.5] + }, + "param_order": ["score_threshold"], + "run_dir": "runs/sweep_threshold/{run_id}", + "metrics": { + "path": "{run_dir}/metrics.json", + "keys": ["metrics.map50", "metrics.map50_95", "metrics.ar100"] + }, + "result_jsonl": "reports/sweep_threshold.jsonl", + "result_csv": "reports/sweep_threshold.csv", + "result_md": "reports/sweep_threshold.md", + "shell": true, + "env": { + "DATASET": "data/coco128", + "SPLIT": "train2017", + "CHECKPOINT": "runs/rtdetr_pose/checkpoint.pt", + "DEVICE": "cuda:0", + "MAX_IMAGES": "50" + } +} diff --git a/docs/sweep_ttt_example.json b/docs/sweep_ttt_example.json new file mode 100644 index 0000000..a05d9b3 --- /dev/null +++ b/docs/sweep_ttt_example.json @@ -0,0 +1,26 @@ +{ + "base_cmd": "python3 tools/yolozu.py export --backend torch --dataset $DATASET --split $SPLIT --checkpoint $CHECKPOINT --device $DEVICE --max-images $MAX_IMAGES --ttt --ttt-method {ttt_method} --ttt-steps {ttt_steps} --ttt-lr {ttt_lr} --ttt-reset {ttt_reset} --wrap --output {run_dir}/predictions.json", + "param_grid": { + "ttt_method": ["tent", "mim"], + "ttt_steps": [1, 3, 5, 10], + "ttt_lr": [1e-5, 5e-5, 1e-4, 5e-4], + "ttt_reset": ["sample", "stream"] + }, + "param_order": ["ttt_method", "ttt_reset", "ttt_steps", "ttt_lr"], + "run_dir": "runs/sweep_ttt/{run_id}", + "metrics": { + "path": "{run_dir}/predictions.json", + "keys": ["meta.run.timestamp"] + }, + "result_jsonl": "reports/sweep_ttt.jsonl", + "result_csv": "reports/sweep_ttt.csv", + "result_md": "reports/sweep_ttt.md", + "shell": true, + "env": { + "DATASET": "data/coco128", + "SPLIT": "train2017", + "CHECKPOINT": "runs/rtdetr_pose/checkpoint.pt", + "DEVICE": "cuda:0", + "MAX_IMAGES": "50" + } +} diff --git a/tests/test_sweep_configs.py b/tests/test_sweep_configs.py new file mode 100644 index 0000000..349000f --- /dev/null +++ b/tests/test_sweep_configs.py @@ -0,0 +1,96 @@ +#!/usr/bin/env python3 +""" +Test suite for sweep configuration examples. +Validates JSON structure and parameter combinations. +""" +import json +import sys +import unittest +from pathlib import Path + +repo_root = Path(__file__).resolve().parents[1] + + +class TestSweepConfigs(unittest.TestCase): + """Test suite for sweep configuration validation.""" + + def _validate_sweep_config(self, config_path: Path) -> None: + """Validate a sweep configuration JSON file.""" + + # Check file exists + self.assertTrue(config_path.exists(), f"Config not found: {config_path}") + + # Parse JSON + try: + config = json.loads(config_path.read_text()) + except Exception as e: + self.fail(f"Failed to parse JSON in {config_path.name}: {e}") + + # Check required fields + required = ["base_cmd", "result_jsonl", "result_csv", "result_md"] + missing = [f for f in required if f not in config] + self.assertEqual([], missing, f"Missing required fields in {config_path.name}: {missing}") + + # Check param_grid or param_list exists + self.assertTrue( + "param_grid" in config or "param_list" in config, + f"{config_path.name} must have either 'param_grid' or 'param_list'" + ) + + # Validate param_grid structure + if "param_grid" in config: + grid = config["param_grid"] + self.assertIsInstance(grid, dict, f"param_grid must be a dict in {config_path.name}") + + for key, values in grid.items(): + self.assertIsInstance( + values, list, + f"param_grid['{key}'] must be a list in {config_path.name}" + ) + self.assertGreater( + len(values), 0, + f"param_grid['{key}'] is empty in {config_path.name}" + ) + + # Calculate total runs + total_runs = 1 + for values in grid.values(): + total_runs *= len(values) + + # Check base_cmd has placeholders for all params + base_cmd = config["base_cmd"] + for param in grid.keys(): + placeholder = "{" + param + "}" + if placeholder not in base_cmd: + # This is a warning, not a failure + print(f" ⚠ Warning: parameter '{param}' not used in base_cmd in {config_path.name}") + + # Validate metrics structure if present + if "metrics" in config: + metrics = config["metrics"] + self.assertIn("path", metrics, f"metrics.path is required in {config_path.name}") + + def test_sweep_ttt_example(self): + """Test TTT sweep configuration.""" + config_path = repo_root / "docs" / "sweep_ttt_example.json" + self._validate_sweep_config(config_path) + + def test_sweep_threshold_example(self): + """Test threshold sweep configuration.""" + config_path = repo_root / "docs" / "sweep_threshold_example.json" + self._validate_sweep_config(config_path) + + def test_sweep_gate_weights_example(self): + """Test gate weights sweep configuration.""" + config_path = repo_root / "docs" / "sweep_gate_weights_example.json" + self._validate_sweep_config(config_path) + + def test_hpo_sweep_example(self): + """Test HPO sweep configuration (if exists).""" + config_path = repo_root / "docs" / "hpo_sweep_example.json" + if config_path.exists(): + self._validate_sweep_config(config_path) + + +if __name__ == "__main__": + unittest.main()