diff --git a/.claude/CLAUDE.md b/.claude/CLAUDE.md index 405f6c260..a20132f5c 100644 --- a/.claude/CLAUDE.md +++ b/.claude/CLAUDE.md @@ -12,7 +12,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [run-pipeline](skills/run-pipeline/SKILL.md) -- Pick a Ready issue from the GitHub Project board, move it through In Progress -> issue-to-pr -> Review pool. One issue at a time; forever-loop handles iteration. - [issue-to-pr](skills/issue-to-pr/SKILL.md) -- Convert a GitHub issue into a PR with an implementation plan. Default rule: one item per PR. Exception: a `[Model]` issue that explicitly claims direct ILP solvability should implement the model and its direct ` -> ILP` rule together; `[Rule]` issues still require both models to exist on `main`. - [add-model](skills/add-model/SKILL.md) -- Add a new problem model. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. -- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Can be used standalone (brainstorms with user) or called from `issue-to-pr`. +- [add-rule](skills/add-rule/SKILL.md) -- Add a new reduction rule. Runs mathematical verification by default (via `/verify-reduction`); pass `--no-verify` to skip for trivial reductions. Can be used standalone or called from `issue-to-pr`. - [review-structural](skills/review-structural/SKILL.md) -- Project-specific structural completeness check: model/rule checklists, build, semantic correctness, issue compliance. Read-only, no code changes. Called by `review-pipeline`. - [review-quality](skills/review-quality/SKILL.md) -- Generic code quality review: DRY, KISS, cohesion/coupling, test quality, HCI. Read-only, no code changes. Called by `review-pipeline`. - [fix-pr](skills/fix-pr/SKILL.md) -- Resolve PR review comments, fix CI failures, and address codecov coverage gaps. Uses `gh api` for codecov (not local `cargo-llvm-cov`). @@ -29,6 +29,7 @@ These repo-local skills live under `.claude/skills/*/SKILL.md`. - [propose](skills/propose/SKILL.md) -- Interactive brainstorming to help domain experts propose a new model or rule. Asks one question at a time, uses mathematical language (no programming jargon), and files a GitHub issue. - [final-review](skills/final-review/SKILL.md) -- Interactive maintainer review for PRs in "Final review" column. Merges main, walks through agentic review bullets with human, then merge or hold. - [dev-setup](skills/dev-setup/SKILL.md) -- Interactive wizard to install and configure all development tools for new maintainers. +- [verify-reduction](skills/verify-reduction/SKILL.md) -- Standalone mathematical verification of a reduction rule: Typst proof, constructor Python (≥5000 checks), adversary Python (≥5000 independent checks). Reports verdict, no artifacts saved. Also called as a subroutine by `/add-rule` (default behavior). - [tutorial](skills/tutorial/SKILL.md) -- Interactive tutorial — walk through the pred CLI to explore, reduce, and solve NP-hard problems. No Rust internals. ## Codex Compatibility diff --git a/.claude/skills/add-rule/SKILL.md b/.claude/skills/add-rule/SKILL.md index 13da004ef..33a303af6 100644 --- a/.claude/skills/add-rule/SKILL.md +++ b/.claude/skills/add-rule/SKILL.md @@ -5,7 +5,16 @@ description: Use when adding a new reduction rule to the codebase, either from a # Add Rule -Step-by-step guide for adding a new reduction rule (A -> B) to the codebase. +Step-by-step guide for adding a new reduction rule (A -> B) to the codebase. By default, every rule goes through mathematical verification (via `/verify-reduction`) before implementation. Pass `--no-verify` to skip verification for trivial reductions. + +## Invocation + +``` +/add-rule # interactive, with verification (default) +/add-rule --no-verify # interactive, skip verification +``` + +When called from `/issue-to-pr`, the `--no-verify` flag is passed through if present. ## Step 0: Gather Required Information @@ -27,6 +36,26 @@ Before any implementation, collect all required information. If called from `iss If any item is missing, ask the user to provide it. Put a high standard on item 7 (concrete example): it must be in tutorial style with clear intuition and easy to understand. Do NOT proceed until the checklist is complete. +## Step 0.5: Type Compatibility Gate + +Check source/target `Value` types before any work: + +```bash +grep "type Value = " src/models/*/.rs src/models/*/.rs +``` + +**Compatible pairs for `ReduceTo` (witness-capable):** +- `Or`->`Or`, `Min`->`Min`, `Max`->`Max` (same type) +- `Or`->`Min`, `Or`->`Max` (feasibility embeds into optimization) + +**Incompatible — STOP if any of these:** +- `Min`->`Or` or `Max`->`Or` — optimization source has no threshold K; needs a decision-variant source model +- `Max`->`Min` or `Min`->`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper +- `Or`->`Sum` or `Min`->`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` +- Any pair involving `And` or `Sum` on the target side + +If incompatible, STOP and comment on the issue explaining the type mismatch and options. Do NOT proceed. + ## Reference Implementations Read these first to understand the patterns: @@ -35,7 +64,19 @@ Read these first to understand the patterns: - **Paper entry:** search `docs/paper/reductions.typ` for `MinimumVertexCover` `MaximumIndependentSet` - **Traits:** `src/rules/traits.rs` (`ReduceTo`, `ReduceToAggregate`, `ReductionResult`, `AggregateReductionResult`) -## Step 1: Implement the reduction +## Step 1: Mathematical Verification (default, skip with `--no-verify`) + +**If `--no-verify` was passed, skip to Step 2.** + +Invoke the `/verify-reduction` skill to mathematically verify the reduction before writing Rust code. This runs the full verification pipeline: Typst proof, constructor Python script (>=5000 checks), adversary subagent (>=5000 independent checks), and cross-comparison. + +All verification artifacts are ephemeral — they exist only in conversation context and temp files. Nothing is committed to the repository. + +**If verification FAILS: STOP. Report to user. Do NOT proceed to implementation.** + +If verification passes, the verified Python `reduce()` and `extract_solution()` functions, along with the YES/NO instances, carry forward in conversation context to inform Steps 2-5. Use them as the canonical spec for the Rust implementation. + +## Step 2: Implement the reduction Create `src/rules/_.rs` (all lowercase, no underscores between words within a problem name): @@ -67,6 +108,7 @@ impl ReductionResult for ReductionXToY { fn target_problem(&self) -> &Self::Target { &self.target } fn extract_solution(&self, target_solution: &[usize]) -> Vec { // Map target solution back to source solution + // If Step 1 ran: translate the verified Python extract_solution() logic } } ``` @@ -78,7 +120,9 @@ impl ReductionResult for ReductionXToY { })] impl ReduceTo for SourceType { type Result = ReductionXToY; - fn reduce_to(&self) -> Self::Result { ... } + fn reduce_to(&self) -> Self::Result { + // If Step 1 ran: translate the verified Python reduce() logic + } } ``` @@ -86,13 +130,13 @@ Each primitive reduction is determined by the exact source/target variant pair. **Aggregate-only reductions:** when the rule preserves aggregate values but cannot recover a source witness from a target witness, implement `AggregateReductionResult` + `ReduceToAggregate` instead of `ReductionResult` + `ReduceTo`. Those edges are not auto-registered by `#[reduction]` yet; register them manually with `ReductionEntry { reduce_aggregate_fn: ..., capabilities: EdgeCapabilities::aggregate_only(), ... }`. See `src/unit_tests/rules/traits.rs` and `src/unit_tests/rules/graph.rs` for the reference pattern. -## Step 2: Register in mod.rs +## Step 3: Register in mod.rs Add to `src/rules/mod.rs`: - `mod _;` - If feature-gated (e.g., ILP): wrap with `#[cfg(feature = "ilp-solver")]` -## Step 3: Write unit tests +## Step 4: Write unit tests Create `src/unit_tests/rules/_.rs`: @@ -105,6 +149,8 @@ Create `src/unit_tests/rules/_.rs`: // 5. Verify: extracted solution is valid and optimal for source ``` +If Step 1 ran, use the verified YES/NO instances from conversation context to construct test cases. Include both a feasible (closed-loop) and infeasible (no witnesses) test. + Additional recommended tests: - Verify target problem structure (correct size, edges, constraints) - Edge cases (empty graph, single vertex, etc.) @@ -117,17 +163,19 @@ For aggregate-only reductions, replace the closed-loop witness test with value-c Link via `#[cfg(test)] #[path = "..."] mod tests;` at the bottom of the rule file. -## Step 4: Add canonical example to example_db +## Step 5: Add canonical example to example_db Add a builder function in `src/example_db/rule_builders.rs` that constructs a small, canonical instance for this reduction. Follow the existing patterns in that file. Register the builder in `build_rule_examples()`. -## Step 5: Document in paper (MANDATORY — DO NOT SKIP) +## Step 6: Document in paper (MANDATORY — DO NOT SKIP) **This step is NOT optional.** Every reduction rule MUST have a corresponding `reduction-rule` entry in the paper. Skipping documentation is a blocking error — the PR will be rejected in review. Do not proceed to Step 6 until the paper entry is written and `make paper` compiles. Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference example:** search for `reduction-rule("KColoring", "QUBO"` to see the gold-standard entry — use it as a template. For a minimal example, see MinimumVertexCover -> MaximumIndependentSet. -### 5a. Write theorem body (rule statement) +If Step 1 ran, adapt the verified Typst proof into the paper's macros. Do not rewrite the proof from scratch — reformat it. + +### 6a. Write theorem body (rule statement) ```typst #reduction-rule("Source", "Target", @@ -140,7 +188,7 @@ Write a `reduction-rule` entry in `docs/paper/reductions.typ`. **Reference examp Three parts: complexity with citation, construction summary, overhead hint. -### 5b. Write proof body +### 6b. Write proof body Use these subsections with italic labels: @@ -158,7 +206,7 @@ Use these subsections with italic labels: Must be self-contained (all notation defined) and reproducible. -### 5c. Write worked example (extra block) +### 6c. Write worked example (extra block) Step-by-step walkthrough with concrete numbers from JSON data. Required steps: 1. Show source instance (dimensions, structure, graph visualization if applicable) @@ -170,7 +218,7 @@ Use `graph-colors`, `g-node()`, `g-edge()` for graph visualization — see refer **Reproducibility:** The `extra:` block must start with a `pred-commands()` call showing the create/reduce/solve/evaluate pipeline. The source-side `pred create --example ...` spec must be derived from the loaded canonical example data via the helper pattern in `write-rule-in-paper`; do not hand-write a bare alias and assume the default variant matches. -### 5d. Build and verify +### 6d. Build and verify ```bash make paper # Must compile without errors @@ -178,7 +226,7 @@ make paper # Must compile without errors Checklist: notation self-contained, complexity cited, overhead consistent, example uses JSON data (not hardcoded), solution verified end-to-end, witness semantics respected, paper compiles. -## Step 6: Regenerate exports and verify +## Step 7: Regenerate exports and verify ```bash cargo run --example export_graph # Generate reduction_graph.json for docs/paper builds @@ -187,7 +235,7 @@ make regenerate-fixtures # Regenerate example_db/fixtures/examples.js make test clippy # Must pass ``` -`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 5 will reference missing fixture data. +`make regenerate-fixtures` is required so the paper can load the new rule's example data from `src/example_db/fixtures/examples.json`. Without it, the `reduction-rule` entry in Step 6 will reference missing fixture data. Structural and quality review is handled by the `review-pipeline` stage, not here. The run stage just needs to produce working code. @@ -229,3 +277,4 @@ Aggregate-only reductions currently have a narrower CLI surface: | Skipping Step 5 (paper documentation) | **Every rule MUST have a `reduction-rule` entry in the paper. This is mandatory, not optional. PRs without documentation will be rejected.** | | Source/target model not fully registered | Both problems must already have `declare_variants!`, aliases as needed, and CLI create support -- use `add-model` skill first | | Treating a direct-to-ILP rule as a toy stub | Direct ILP reductions need exact overhead metadata and strong semantic regression tests, just like other production ILP rules | +| Skipping verification for complex reductions | Verification is default for a reason — `--no-verify` is for trivial identity/complement reductions only | diff --git a/.claude/skills/issue-to-pr/SKILL.md b/.claude/skills/issue-to-pr/SKILL.md index 9962f4b8a..2573bedd5 100644 --- a/.claude/skills/issue-to-pr/SKILL.md +++ b/.claude/skills/issue-to-pr/SKILL.md @@ -9,7 +9,8 @@ Convert a GitHub issue into a PR: write a plan, create the PR, then execute the ## Invocation -- `/issue-to-pr 42` — create PR with plan, then execute +- `/issue-to-pr 42` — create PR with plan, then execute (for `[Rule]` issues, verification runs by default) +- `/issue-to-pr 42 --no-verify` — skip mathematical verification for `[Rule]` issues For Codex, open this `SKILL.md` directly and treat the slash-command forms above as aliases. The Makefile `run-issue` target already does this translation. @@ -37,6 +38,7 @@ Normalize to: - `ISSUE=` - `REPO=` (default `CodingThrust/problem-reductions`) - `EXECUTE=true|false` +- `NO_VERIFY=true|false` (default `false`; pass `--no-verify` to skip mathematical verification for `[Rule]` issues) ### 2. Fetch Issue + Preflight Guards @@ -91,7 +93,7 @@ The plan MUST reference the appropriate implementation skill and follow its step - **For ordinary `[Model]` issues:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 as the action pipeline - **For `[Model]` issues that explicitly claim direct ILP solving:** Follow [add-model](../add-model/SKILL.md) Steps 1-7 **and** [add-rule](../add-rule/SKILL.md) Steps 1-6 for the direct ` -> ILP` rule in the same plan / PR -- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-6 as the action pipeline +- **For `[Rule]` issues:** Follow [add-rule](../add-rule/SKILL.md) Steps 1-7 as the action pipeline. By default, `/add-rule` runs mathematical verification (Step 1) before implementation. If `--no-verify` was passed, include `--no-verify` when invoking `/add-rule` to skip verification. Include the concrete details from the issue (problem definition, reduction algorithm, example, etc.) mapped onto each step. diff --git a/.claude/skills/verify-reduction/SKILL.md b/.claude/skills/verify-reduction/SKILL.md new file mode 100644 index 000000000..76f1246f6 --- /dev/null +++ b/.claude/skills/verify-reduction/SKILL.md @@ -0,0 +1,252 @@ +--- +name: verify-reduction +description: Standalone mathematical verification of a reduction rule — generates Typst proof, constructor Python script (>=5000 checks), and adversary Python script (>=5000 independent checks). Reports verdict. No artifacts saved. +--- + +# Verify Reduction + +Mathematical verification of a reduction rule. Produces a Typst proof + dual Python verification scripts, iterating until all checks pass. Reports a VERIFIED/FAILED verdict. All artifacts are ephemeral — nothing is committed to the repository. + +Use standalone to check correctness before implementation, or as a subroutine of `/add-rule` (which calls this by default). + +## Invocation + +``` +/verify-reduction 868 +/verify-reduction SubsetSum Partition +``` + +## Step 0: Parse Input + +```bash +REPO=$(gh repo view --json nameWithOwner --jq .nameWithOwner) +ISSUE= +ISSUE_JSON=$(gh issue view "$ISSUE" --json title,body,number) +``` + +If invoked with problem names instead of an issue number, use the names directly. + +## Step 1: Read Issue, Study Models, Type Check + +```bash +gh issue view "$ISSUE" --json title,body +pred show --json +pred show --json +``` + +### Type compatibility gate — MANDATORY + +Check source/target `Value` types before any work: + +```bash +grep "type Value = " src/models/*/.rs src/models/*/.rs +``` + +**Compatible pairs for `ReduceTo` (witness-capable):** +- `Or`->`Or`, `Min`->`Min`, `Max`->`Max` (same type) +- `Or`->`Min`, `Or`->`Max` (feasibility embeds into optimization) + +**Incompatible — STOP if any of these:** +- `Min`->`Or` or `Max`->`Or` — optimization source has no threshold K; needs a decision-variant source model +- `Max`->`Min` or `Min`->`Max` — opposite optimization directions; needs `ReduceToAggregate` or a decision-variant wrapper +- `Or`->`Sum` or `Min`->`Sum` — Sum is aggregate-only; needs `ReduceToAggregate` +- Any pair involving `And` or `Sum` on the target side + +If incompatible, STOP and report the type mismatch and options. Do NOT proceed. + +### If compatible + +Extract: construction algorithm, correctness argument, overhead formulas, worked example, reference. Use WebSearch if the issue is incomplete. + +## Step 2: Write Typst Proof + +Write a standalone Typst proof (in a temp file, not committed). + +**Mandatory structure:** + +```typst +== Source $arrow.r$ Target +#theorem[...] +#proof[ + _Construction._ (numbered steps, every symbol defined before first use) + _Correctness._ + ($arrow.r.double$) ... (genuinely independent, NOT "the converse is similar") + ($arrow.l.double$) ... + _Solution extraction._ ... +] +*Overhead.* (table with target metric -> formula) +*Feasible example.* (YES instance, >=3 variables, fully worked with numbers) +*Infeasible example.* (NO instance, fully worked — show WHY no solution exists) +``` + +**Hard rules:** +- Zero instances of "clearly", "obviously", "it is easy to see", "straightforward" +- Zero scratch work ("Wait", "Hmm", "Actually", "Let me try") +- Two examples minimum, both with >=3 variables/vertices +- Every symbol defined before first use + +## Step 3: Write Constructor Python Script + +Write a Python verification script (temp file) with ALL 7 mandatory sections: + +| Section | What to verify | Notes | +|---------|---------------|-------| +| 1. Symbolic (sympy) | Overhead formulas symbolically for general n | "The overhead is trivial" is NOT an excuse to skip | +| 2. Exhaustive forward+backward | Source feasible <=> target feasible | n <= 5 minimum. ALL instances or >=300 sampled per (n,m) | +| 3. Solution extraction | Extract source solution from every feasible target witness | Most commonly skipped section. DO NOT SKIP | +| 4. Overhead formula | Build target, measure actual size, compare against formula | Catches off-by-one in construction | +| 5. Structural properties | Target well-formed, no degenerate cases | Gadget reductions: girth, connectivity, widget structure | +| 6. YES example | Reproduce exact Typst feasible example numbers | Every value must match | +| 7. NO example | Reproduce exact Typst infeasible example, verify both sides infeasible | Must verify WHY infeasible | + +### Minimum check counts — STRICTLY ENFORCED + +| Type | Minimum checks | Minimum n | +|------|---------------|-----------| +| Identity (same graph, different objective) | 10,000 | n <= 6 | +| Algebraic (padding, complement, case split) | 10,000 | n <= 5 | +| Gadget (widget, cycle construction) | 5,000 | n <= 5 | + +Every reduction gets at least 5,000 checks regardless of perceived simplicity. + +## Step 4: Run and Iterate + +```bash +python3 /tmp/verify__.py +``` + +### Iteration 1: Fix failures + +Run the script. Fix any failures. Re-run until 0 failures. + +### Iteration 2: Check count audit + +Print and fill this table honestly: + +``` +CHECK COUNT AUDIT: + Total checks: ___ (minimum: 5,000) + Forward direction: ___ instances (minimum: all n <= 5) + Backward direction: ___ instances (minimum: all n <= 5) + Solution extraction: ___ feasible instances tested + Overhead formula: ___ instances compared + Symbolic (sympy): ___ identities verified + YES example: verified? [yes/no] + NO example: verified? [yes/no] + Structural properties: ___ checks +``` + +If ANY line is below minimum, enhance the script and re-run. Do NOT proceed. + +### Iteration 3: Gap analysis + +List EVERY claim in the Typst proof and whether it's tested: + +``` +CLAIM TESTED BY +"Universe has 2n elements" Section 4: overhead +"Complementarity forces consistency" Section 3: extraction +"Forward: NAE-sat -> valid splitting" Section 2: exhaustive +... +``` + +If any claim has no test, add one. If untestable, document WHY. + +## Step 5: Adversary Verification + +Dispatch a subagent that reads ONLY the Typst proof (not the constructor script) and independently implements + tests the reduction. + +**Adversary requirements:** +- Own `reduce()` function from scratch +- Own `extract_solution()` function +- Own `is_feasible_source()` and `is_feasible_target()` validators +- Exhaustive forward + backward for n <= 5 +- `hypothesis` property-based testing (>=2 strategies) +- Reproduce both Typst examples (YES and NO) +- >=5,000 total checks +- Must NOT import from the constructor script + +**Typed adversary focus** (include in prompt): +- **Identity reductions:** exhaustive enumeration n <= 6, edge-case configs (all-zero, all-one, alternating) +- **Algebraic reductions:** case boundary conditions (e.g., S = 2T exactly, S = 2T +/- 1), per-case extraction +- **Gadget reductions:** widget structure invariants, traversal patterns, interior vertex isolation + +### Cross-comparison + +After both scripts pass, compare `reduce()` outputs on shared instances. Both must produce structurally identical targets and agree on feasibility for all tested instances. + +### Verdict table + +| Constructor | Adversary | Cross-compare | Verdict | Action | +|-------------|-----------|---------------|---------|--------| +| Pass | Pass | Agree | **VERIFIED** | Done (or proceed to add-rule Step 2) | +| Pass | Pass | Disagree | **Suspect** | Investigate — may be isomorphic or latent bug | +| Pass | Fail | -- | **Adversary bug** | Fix adversary or clarify Typst spec | +| Fail | Pass | -- | **Constructor bug** | Fix constructor, re-run from Step 4 | +| Fail | Fail | -- | **Proof bug** | Re-examine Typst proof, return to Step 2 | + +## Step 6: Self-Review Checklist + +Every item must be YES. If any is NO, go back and fix. + +### Typst proof +- [ ] Construction with numbered steps, symbols defined before use +- [ ] Correctness with independent => and <= paragraphs +- [ ] Solution extraction section present +- [ ] Overhead table with formulas +- [ ] YES example (>=3 variables, fully worked) +- [ ] NO example (fully worked, explains WHY infeasible) +- [ ] Zero hand-waving language +- [ ] Zero scratch work + +### Constructor Python +- [ ] 0 failures, >=5,000 total checks +- [ ] All 7 sections present and non-empty +- [ ] Exhaustive n <= 5 +- [ ] Extraction tested for every feasible instance +- [ ] Gap analysis: every Typst claim has a test + +### Adversary Python +- [ ] 0 failures, >=5,000 total checks +- [ ] Independent implementation (no imports from constructor) +- [ ] `hypothesis` PBT with >=2 strategies +- [ ] Reproduces both Typst examples + +### Cross-consistency +- [ ] Cross-comparison: 0 disagreements, 0 feasibility mismatches + +## Step 7: Report Verdict + +Report the final verdict to the user: + +``` +VERIFICATION RESULT: VERIFIED / FAILED + Source: + Target: + Constructor checks: + Adversary checks: + Cross-comparison: instances, 0 disagreements + Issue: # +``` + +If called as a subroutine of `/add-rule`, the verified Python `reduce()`, `extract_solution()`, and YES/NO instances remain in conversation context for use in the Rust implementation steps. No files are saved. + +If called standalone, the verdict is the final output. The user can inspect the proof and scripts interactively during the session. + +## Common Mistakes + +| Mistake | Consequence | +|---------|-------------| +| Proceeding past type gate with incompatible types | Wasted work — math may be correct but `ReduceTo` impl is impossible | +| Adversary imports from constructor script | Rejected — must be independent | +| No `hypothesis` PBT in adversary | Rejected | +| Section 1 (symbolic) empty | Rejected — "overhead is trivial" is not an excuse | +| Only YES example, no NO example | Rejected | +| n <= 3 or n <= 4 "because it's simple" | Rejected — minimum n <= 5 | +| No gap analysis | Rejected — perform before proceeding | +| Example has < 3 variables | Rejected — too degenerate | +| Either script has < 5,000 checks | Rejected — enhance testing | +| Extraction (Section 3) not tested | Rejected — most commonly skipped | +| Cross-comparison skipped | Rejected | +| Disagreements dismissed without investigation | Rejected | +| Saving artifacts to the repository | All files are ephemeral — use temp directory, nothing committed |