Skip to content
This repository was archived by the owner on Feb 21, 2026. It is now read-only.

P2: Sweep configs for TTT/threshold/gate tuning + Rust inference spec#14

Draft
Copilot wants to merge 4 commits intomainfrom
copilot/expand-sweeps-for-ttt
Draft

P2: Sweep configs for TTT/threshold/gate tuning + Rust inference spec#14
Copilot wants to merge 4 commits intomainfrom
copilot/expand-sweeps-for-ttt

Conversation

Copy link
Contributor

Copilot AI commented Feb 9, 2026

Completes P2 roadmap item: extends existing sweep harness for hyperparameter exploration, documents production inference paths.

Sweep Configurations

Added three parameterized sweep configs building on hpo_sweep.py:

  • TTT sweep (sweep_ttt_example.json): 64 runs exploring tent/mim methods, adaptation steps (1-10), learning rates (1e-5 to 5e-4), reset policies
  • Threshold sweep (sweep_threshold_example.json): 8 runs spanning score thresholds 0.001-0.5, chains export + COCO eval
  • Gate weight sweep (sweep_gate_weights_example.json): 8 runs grid-searching detection/template/uncertainty fusion weights

All configs use environment variables for fixed settings (dataset, checkpoint, device), support --resume for incremental runs, emit CSV/MD tables.

Example TTT sweep invocation:

python3 tools/hpo_sweep.py --config docs/sweep_ttt_example.json --resume
# Outputs: reports/sweep_ttt.{jsonl,csv,md}

Production Inference

  • C++ template (examples/infer_cpp/): Verified complete—stub/ONNXRuntime/TensorRT runners with CMake build
  • Rust spec (docs/rust_inference_template.md): Implementation guide covering ort/tract/candle backends, preprocessing/postprocessing, schema compliance

Documentation

  • docs/sweep_examples.md: Usage guide with plotting examples, tips for reproducible comparisons
  • Updated README: sweep examples section, production cores section, P2 marked done

Cache/re-run already implemented via --cache flag (SHA256 config fingerprinting, automatic runs/yolozu_runs/<hash>/ organization).

Original prompt

code をチェックして下記タスクをcomplete してほしい

P2 (partial): cache/re-run (fingerprinted runs) + sweeps (wrapper exists; expand sweeps for TTT/threshold/gate weights) + production inference cores (C++/Rust) as needed.


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot AI and others added 3 commits February 9, 2026 01:12
Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Co-authored-by: thinksyncs <42225585+thinksyncs@users.noreply.github.com>
Copilot AI changed the title [WIP] Implement caching and expand sweeps for TTT and weights P2: Sweep configs for TTT/threshold/gate tuning + Rust inference spec Feb 9, 2026
Copilot AI requested a review from thinksyncs February 9, 2026 01:18
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds example sweep configurations (TTT/threshold/gate-weight tuning) and accompanying documentation, plus a Rust production-inference specification doc, to complete the P2 roadmap item around sweep exploration and production inference guidance.

Changes:

  • Added three new sweep JSON configs for TTT, score-threshold tuning, and gate-weight tuning.
  • Added documentation for running/analyzing sweeps and a Rust inference “spec template”.
  • Added a test file intended to validate sweep config structure; updated README and added a P2 security summary.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
tests/test_sweep_configs.py Adds a config-validation “test” for sweep example JSON files.
docs/sweep_ttt_example.json Adds a 64-run param grid for TTT tuning (method/reset/steps/lr).
docs/sweep_threshold_example.json Adds an 8-run score-threshold sweep chaining export + COCO eval.
docs/sweep_gate_weights_example.json Adds an 8-run sweep over gate-weight grid specs / metric target.
docs/sweep_examples.md Adds a usage guide and analysis tips for the new sweep configs.
docs/rust_inference_template.md Adds a Rust inference implementation guide/spec (not an implementation).
SECURITY_SUMMARY_P2.md Adds a security review summary for the P2 sweep/doc additions.
README.md Marks P2 as done and links to sweep examples + production inference cores.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

### Sweep Configurations
The sweep configs execute shell commands via `hpo_sweep.py`. Security notes:
- Commands are parameterized via config file (user controls all inputs)
- Environment variables are set explicitly in config (no inheriting untrusted env)
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This security note is factually incorrect for the current implementation: tools/hpo_sweep.py starts from os.environ.copy() and then overlays values from the config, so it does inherit the caller’s environment in addition to the explicit env block. Please reword to reflect that behavior (or change the implementation if the intention is to run with a clean env).

Suggested change
- Environment variables are set explicitly in config (no inheriting untrusted env)
- Environment starts from the caller’s environment, with variables from the config explicitly overlaying it (callers should ensure their environment is trusted or run with a sanitized env)

Copilot uses AI. Check for mistakes.
"run_dir": "runs/sweep_threshold/{run_id}",
"metrics": {
"path": "{run_dir}/metrics.json",
"keys": ["map50", "map50_95", "mar_100"]
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics.keys paths don’t match the actual JSON structure written by tools/eval_coco.py. That tool writes metrics under metrics.* (e.g. metrics.map50_95, metrics.map50, metrics.ar100), so the current keys (map50, map50_95, mar_100) will all extract as null/None.

Suggested change
"keys": ["map50", "map50_95", "mar_100"]
"keys": ["metrics.map50", "metrics.map50_95", "metrics.ar100"]

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +112
**Command breakdown**:
1. Export predictions with varying thresholds
2. Evaluate each with COCO mAP (`eval_coco.py`)
3. Extract `map50`, `map50_95`, `mar_100` from metrics JSON

Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section describes extracting map50, map50_95, and mar_100 directly from the eval output JSON, but tools/eval_coco.py nests these under metrics.* and uses ar100 (not mar_100). Update the doc to match the actual output schema so readers can copy/paste the sweep config successfully.

Copilot uses AI. Check for mistakes.
Comment on lines +186 to +189
Each run produces a `gate_tuning_report.json` with:
- `best_weights`: `{det, tmp, unc}` — optimal weights found
- `best_score`: mAP achieved with those weights
- `grid_results`: full grid search results
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document claims gate_tuning_report.json contains best_weights and best_score, but tools/tune_gate_weights.py writes a metrics_report where the best row is under metrics.tuning.best (with det/tmp/unc and map50/map50_95). If you want best_weights in the report, it needs to be added by the tool; otherwise update this section to reflect the current report structure.

Suggested change
Each run produces a `gate_tuning_report.json` with:
- `best_weights`: `{det, tmp, unc}`optimal weights found
- `best_score`: mAP achieved with those weights
- `grid_results`: full grid search results
Each run produces a `gate_tuning_report.json` metrics report with:
- `metrics.tuning.best.det`, `metrics.tuning.best.tmp`, `metrics.tuning.best.unc`: optimal gate weights found
- `metrics.tuning.best.map50`, `metrics.tuning.best.map50_95`: mAP scores achieved with those weights
- additional tuning rows under `metrics.tuning` that the sweep harness can aggregate into CSV/Markdown

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +8
#!/usr/bin/env python3
"""
Test suite for sweep configuration examples.
Validates JSON structure and parameter combinations.
"""
import json
import sys
from pathlib import Path
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module won’t execute any checks under the repo’s CI (python -m unittest) because it defines no unittest.TestCase (or test_* functions). As a result, these sweep configs won’t actually be validated in CI. Convert this into a unittest.TestCase with test methods that assert validity for each config path (and drop the custom main() / sys.exit() entrypoint, or keep it only under if __name__ == "__main__").

Copilot uses AI. Check for mistakes.
"grid_unc": ["0.0,0.25,0.5,0.75,1.0", "0.0,0.5,1.0"],
"metric": ["map50_95", "map50"]
},
"param_order": ["metric", "grid_tmp", "grid_unc"],
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

param_order omits grid_det even though it’s in param_grid. If someone changes grid_det to have multiple values later, different runs could collapse to the same run_id (affecting --resume behavior and output directories). Consider including grid_det in param_order for safety/consistency.

Suggested change
"param_order": ["metric", "grid_tmp", "grid_unc"],
"param_order": ["metric", "grid_det", "grid_tmp", "grid_unc"],

Copilot uses AI. Check for mistakes.
"run_dir": "runs/sweep_gate_weights/{run_id}",
"metrics": {
"path": "{run_dir}/gate_tuning_report.json",
"keys": ["best_weights.det", "best_weights.tmp", "best_weights.unc", "best_score"]
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The metrics.keys paths don’t match the structure of gate_tuning_report.json produced by tools/tune_gate_weights.py (it’s a metrics_report with fields under metrics.tuning.*). As written, these keys will extract as null/None. Update them to the correct dot-paths (e.g., pull metrics.tuning.best.det/tmp/unc and the selected metric value).

Suggested change
"keys": ["best_weights.det", "best_weights.tmp", "best_weights.unc", "best_score"]
"keys": [
"metrics.tuning.best.det",
"metrics.tuning.best.tmp",
"metrics.tuning.best.unc",
"metrics.tuning.best.score"
]

Copilot uses AI. Check for mistakes.
Comment on lines +242 to +246
| Sweep Type | Config File | Typical Runs | Outputs | Use Case |
|------------|-------------|--------------|---------|----------|
| TTT | `sweep_ttt_example.json` | 64 | `sweep_ttt.{jsonl,csv,md}` | Find best TTT hyperparams |
| Threshold | `sweep_threshold_example.json` | 8 | `sweep_threshold.{jsonl,csv,md}` | Find optimal score cutoff |
| Gate Weights | `sweep_gate_weights_example.json` | 8 | `sweep_gate_weights.{jsonl,csv,md}` | Tune inference-time score fusion |
Copy link

Copilot AI Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The summary table is using || at the start of each row, which is not valid GitHub-flavored Markdown table syntax and renders incorrectly. Use single leading/trailing pipes (| ... |) like the other tables in the repo.

Copilot uses AI. Check for mistakes.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants