Fix PR#14: Align sweep configs and docs with actual tool output schemas#15
Fix PR#14: Align sweep configs and docs with actual tool output schemas#15
Conversation
- Fix SECURITY_SUMMARY_P2.md env security note - Fix sweep_threshold_example.json metrics.keys paths - Fix sweep_gate_weights_example.json param_order and metrics.keys - Fix sweep_examples.md docs to match actual output schemas - Convert test_sweep_configs.py to proper unittest.TestCase Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Replace non-existent 'score' field with 'map50_95' to match actual tune_gate_weights.py output Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Simplify parenthetical to focus on key distinction Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds/updates sweep example configs and documentation, plus a CI-friendly unittest to validate the example JSON sweep schemas (addressing prior PR#14 review feedback).
Changes:
- Added a unittest-based validator for the sweep example JSON configs.
- Added JSON sweep configuration examples for TTT, threshold sweeps, and gate-weight tuning.
- Added/updated sweep documentation (including metric key paths) and a P2 security summary write-up.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
tests/test_sweep_configs.py |
New unittest to validate sweep config example JSON structure. |
docs/sweep_ttt_example.json |
New example sweep config for TTT parameter exploration. |
docs/sweep_threshold_example.json |
New example sweep config for threshold tuning + COCO eval metric extraction. |
docs/sweep_gate_weights_example.json |
New example sweep config for gate-weight tuning report metric extraction. |
docs/sweep_examples.md |
New/updated documentation explaining sweep configs and usage patterns. |
SECURITY_SUMMARY_P2.md |
New security summary documenting checks and security considerations for the sweep expansion. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Validates JSON structure and parameter combinations. | ||
| """ | ||
| import json | ||
| import sys |
There was a problem hiding this comment.
sys is imported but never used, which triggers Ruff F401 and will fail CI (ruff selects F). Remove the unused import (or use it if needed).
| import sys |
| # Calculate total runs | ||
| total_runs = 1 | ||
| for values in grid.values(): | ||
| total_runs *= len(values) |
There was a problem hiding this comment.
total_runs is computed but never used, which triggers Ruff F841 and will fail CI. Either remove this variable or assert against an expected run count so the value is actually used.
| total_runs *= len(values) | |
| total_runs *= len(values) | |
| self.assertGreaterEqual( | |
| total_runs, | |
| 1, | |
| f"Total runs must be at least 1 in {config_path.name}", | |
| ) |
| | Sweep Type | Config File | Typical Runs | Outputs | Use Case | | ||
| |------------|-------------|--------------|---------|----------| | ||
| | TTT | `sweep_ttt_example.json` | 64 | `sweep_ttt.{jsonl,csv,md}` | Find best TTT hyperparams | | ||
| | Threshold | `sweep_threshold_example.json` | 8 | `sweep_threshold.{jsonl,csv,md}` | Find optimal score cutoff | | ||
| | Gate Weights | `sweep_gate_weights_example.json` | 8 | `sweep_gate_weights.{jsonl,csv,md}` | Tune inference-time score fusion | |
There was a problem hiding this comment.
The summary table rows start with ||, which breaks standard Markdown table formatting (it creates an extra empty column in most renderers). Use a single leading | for each row and keep the header/separator row column counts consistent.
| The test validation script: | ||
| - Only reads JSON files from known locations | ||
| - Uses safe JSON parsing (`json.loads`) | ||
| - No file write operations except when run with output flags |
There was a problem hiding this comment.
This section says the test script has "output flags" and may write files, but tests/test_sweep_configs.py only reads JSON files and doesn't implement any output/write flags. Please update this wording to reflect the actual behavior so the security summary doesn't contain inaccurate claims.
| - No file write operations except when run with output flags | |
| - No file write operations (read-only JSON validation) |
PR#14 introduced sweep configurations with incorrect field paths and documentation that didn't match actual tool outputs. This would cause null extractions and user confusion.
Changes
Config schema alignment
sweep_threshold_example.json: Fixed metrics paths from["map50", "mar_100"]to["metrics.map50", "metrics.ar100"](eval_coco.py nests undermetrics.*)sweep_gate_weights_example.json: Fixed paths from["best_weights.det", "best_score"]to["metrics.tuning.best.det", "metrics.tuning.best.map50_95"](tune_gate_weights.py structure)sweep_gate_weights_example.json: Addedgrid_dettoparam_orderto prevent run_id collisionsDocumentation corrections
sweep_examples.md: Updated threshold and gate weight sections to reflect actual JSON output structuressweep_examples.md: Fixed markdown table syntax (||→|)SECURITY_SUMMARY_P2.md: Corrected environment handling note (hpo_sweep.py inherits caller's env and overlays config, not isolated)Test infrastructure
test_sweep_configs.py: Converted to properunittest.TestCaseso CI can discover/run tests viapython -m unittestAll paths verified against actual tool implementations (
eval_coco.py,tune_gate_weights.py).Original prompt
💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.