Fix PR#14: Align sweep configs and docs with actual tool output schemas by Copilot · Pull Request #15 · thinksyncs/YOLOZU

Copilot · 2026-02-09T23:49:03Z

PR#14 introduced sweep configurations with incorrect field paths and documentation that didn't match actual tool outputs. This would cause null extractions and user confusion.

Changes

Config schema alignment

sweep_threshold_example.json: Fixed metrics paths from ["map50", "mar_100"] to ["metrics.map50", "metrics.ar100"] (eval_coco.py nests under metrics.*)
sweep_gate_weights_example.json: Fixed paths from ["best_weights.det", "best_score"] to ["metrics.tuning.best.det", "metrics.tuning.best.map50_95"] (tune_gate_weights.py structure)
sweep_gate_weights_example.json: Added grid_det to param_order to prevent run_id collisions

Documentation corrections

sweep_examples.md: Updated threshold and gate weight sections to reflect actual JSON output structures
sweep_examples.md: Fixed markdown table syntax (|| → |)
SECURITY_SUMMARY_P2.md: Corrected environment handling note (hpo_sweep.py inherits caller's env and overlays config, not isolated)

Test infrastructure

test_sweep_configs.py: Converted to proper unittest.TestCase so CI can discover/run tests via python -m unittest

All paths verified against actual tool implementations (eval_coco.py, tune_gate_weights.py).

Original prompt

fix PR#14 and make new PR

💬 We'd love your input! Share your thoughts on Copilot coding agent in our 2 minute survey.

- Fix SECURITY_SUMMARY_P2.md env security note - Fix sweep_threshold_example.json metrics.keys paths - Fix sweep_gate_weights_example.json param_order and metrics.keys - Fix sweep_examples.md docs to match actual output schemas - Convert test_sweep_configs.py to proper unittest.TestCase Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>

Replace non-existent 'score' field with 'map50_95' to match actual tune_gate_weights.py output Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>

Simplify parenthetical to focus on key distinction Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>

Copilot

Pull request overview

Adds/updates sweep example configs and documentation, plus a CI-friendly unittest to validate the example JSON sweep schemas (addressing prior PR#14 review feedback).

Changes:

Added a unittest-based validator for the sweep example JSON configs.
Added JSON sweep configuration examples for TTT, threshold sweeps, and gate-weight tuning.
Added/updated sweep documentation (including metric key paths) and a P2 security summary write-up.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
`tests/test_sweep_configs.py`	New unittest to validate sweep config example JSON structure.
`docs/sweep_ttt_example.json`	New example sweep config for TTT parameter exploration.
`docs/sweep_threshold_example.json`	New example sweep config for threshold tuning + COCO eval metric extraction.
`docs/sweep_gate_weights_example.json`	New example sweep config for gate-weight tuning report metric extraction.
`docs/sweep_examples.md`	New/updated documentation explaining sweep configs and usage patterns.
`SECURITY_SUMMARY_P2.md`	New security summary documenting checks and security considerations for the sweep expansion.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-10T00:02:59Z

tests/test_sweep_configs.py

+Validates JSON structure and parameter combinations.
+"""
+import json
+import sys


sys is imported but never used, which triggers Ruff F401 and will fail CI (ruff selects F). Remove the unused import (or use it if needed).

Suggested change

import sys

Copilot · 2026-02-10T00:02:59Z

tests/test_sweep_configs.py

+            # Calculate total runs
+            total_runs = 1
+            for values in grid.values():
+                total_runs *= len(values)


total_runs is computed but never used, which triggers Ruff F841 and will fail CI. Either remove this variable or assert against an expected run count so the value is actually used.

Suggested change

total_runs *= len(values)

total_runs *= len(values)

self.assertGreaterEqual(

total_runs,

1,

f"Total runs must be at least 1 in {config_path.name}",

)

Copilot · 2026-02-10T00:02:59Z

docs/sweep_examples.md

+| Sweep Type | Config File | Typical Runs | Outputs | Use Case |
+|------------|-------------|--------------|---------|----------|
+| TTT | `sweep_ttt_example.json` | 64 | `sweep_ttt.{jsonl,csv,md}` | Find best TTT hyperparams |
+| Threshold | `sweep_threshold_example.json` | 8 | `sweep_threshold.{jsonl,csv,md}` | Find optimal score cutoff |
+| Gate Weights | `sweep_gate_weights_example.json` | 8 | `sweep_gate_weights.{jsonl,csv,md}` | Tune inference-time score fusion |


The summary table rows start with ||, which breaks standard Markdown table formatting (it creates an extra empty column in most renderers). Use a single leading | for each row and keep the header/separator row column counts consistent.

Copilot · 2026-02-10T00:03:00Z

SECURITY_SUMMARY_P2.md

+The test validation script:
+- Only reads JSON files from known locations
+- Uses safe JSON parsing (`json.loads`)
+- No file write operations except when run with output flags


This section says the test script has "output flags" and may write files, but tests/test_sweep_configs.py only reads JSON files and doesn't implement any output/write flags. Please update this wording to reflect the actual behavior so the security summary doesn't contain inaccurate claims.

Suggested change

- No file write operations except when run with output flags

- No file write operations (read-only JSON validation)

Initial plan

13240e3

Copilot AI assigned Copilot and thinksyncs Feb 9, 2026

Copilot started work on behalf of thinksyncs February 9, 2026 23:49 View session

Copilot AI and others added 3 commits February 9, 2026 23:53

Fix sweep_gate_weights_example.json metrics.keys

e759a1a

Replace non-existent 'score' field with 'map50_95' to match actual tune_gate_weights.py output Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>

Improve clarity of sweep_examples.md note

15426c6

Simplify parenthetical to focus on key distinction Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>

thinksyncs requested a review from Copilot February 9, 2026 23:57

Copilot started reviewing on behalf of thinksyncs February 9, 2026 23:58 View session

Copilot AI changed the title ~~[WIP] Fix issues in PR#14 and create new pull request~~ Fix PR#14: Align sweep configs and docs with actual tool output schemas Feb 9, 2026

Copilot AI requested a review from thinksyncs February 9, 2026 23:58

Copilot finished work on behalf of thinksyncs February 9, 2026 23:58

Copilot AI reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PR#14: Align sweep configs and docs with actual tool output schemas#15

Fix PR#14: Align sweep configs and docs with actual tool output schemas#15
Copilot wants to merge 4 commits intomainfrom
copilot/fix-pr14-issues

Copilot AI commented Feb 9, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Copilot AI Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

-                total_runs *= len(values)
+                total_runs *= len(values)
+            self.assertGreaterEqual(
+                total_runs,
+,
+                f"Total runs must be at least 1 in {config_path.name}",
+            )

	- No file write operations except when run with output flags
	- No file write operations (read-only JSON validation)

Conversation

Copilot AI commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 10, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Feb 9, 2026 •

edited

Loading