Skip to content

docs: verify reduction #841 — NAE 3-SAT → Set Splitting#983

Closed
isPANN wants to merge 5 commits intomainfrom
verify/nae3sat-setsplitting-841
Closed

docs: verify reduction #841 — NAE 3-SAT → Set Splitting#983
isPANN wants to merge 5 commits intomainfrom
verify/nae3sat-setsplitting-841

Conversation

@isPANN
Copy link
Copy Markdown
Collaborator

@isPANN isPANN commented Apr 1, 2026

Summary

  • Formal Typst proof of NAE 3-SAT → Set Splitting reduction (Lovász 1973 / Garey & Johnson SP4)
  • Constructor verification script: 41,107 checks, 0 failures (7 mandatory sections, exhaustive n=3, sampled n=4,5)
  • Adversary verification script: 19,832 checks, 0 failures (independent implementation, 3 hypothesis property-based strategies)
  • Cross-comparison: 900 instances, 0 disagreements, 0 feasibility mismatches

Verification Report

=== Verification Report: NAE 3-SAT → Set Splitting (#841) ===

Typst proof: docs/paper/proposed-reductions-841.typ
  - Construction: ✓ (3 steps: universe, complementarity, clause subsets)
  - Correctness: ✓ (⟹ + ⟸)
  - Extraction: ✓
  - Overhead: ✓ (universe_size = 2n, num_subsets = n + m)
  - YES example: ✓ (4 vars, 3 clauses, σ=(1,0,1,0))
  - NO example: ✓ (3 vars, 8 clauses = all sign patterns, every σ has all-true clause)

Constructor: verify_nae3sat_setsplitting.py
  - Checks: 41,107 passed, 0 failed
  - Sections: 1(sympy) 2(exhaustive) 3(extraction) 4(overhead) 5(structural) 6(YES) 7(NO)
  - Forward: exhaustive n=3, sampled n=4,5
  - Backward: exhaustive n=3, sampled n=4,5
  - Gap analysis: all claims covered

Adversary: adversary_nae3sat_setsplitting.py
  - Checks: 19,832 passed, 0 failed
  - Property-based: 3 hypothesis strategies (2,500 examples)
  - Forward: exhaustive n ≤ 5
  - Backward: exhaustive n ≤ 5
  - Bugs found: none

Cross-comparison:
  - Instances compared: 900
  - Structural agreement: 900/900
  - Feasibility agreement: 900/900

Bugs found: none
Iterations: 2 rounds (fixed NO example scratch work, fixed clause-matching index)

Verdict: VERIFIED

Test plan

  • Typst compiles without errors
  • Constructor script: python3 docs/paper/verify-reductions/verify_nae3sat_setsplitting.py → 0 failures
  • Adversary script: python3 docs/paper/verify-reductions/adversary_nae3sat_setsplitting.py → 0 failures
  • Cross-comparison: python3 docs/paper/verify-reductions/cross_compare_nae3sat_setsplitting.py → 0 disagreements

Closes #841

🤖 Generated with Claude Code

zazabap and others added 5 commits March 31, 2026 16:13
Covers 9 reductions: 2 NP-hardness chain extensions (#973, #198),
4 Tier 1a blocked issues (#379, #380, #888, #822), and
3 Tier 1b blocked issues (#892, #894, #890).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…uctions

New skill: /verify-reduction <issue-number>

End-to-end pipeline that takes a reduction rule issue and produces:
1. Typst proof (Construction/Correctness/Extraction/Overhead + YES/NO examples)
2. Python verification script (7 mandatory sections, ≥5000 checks, exhaustive n≤5)
3. Lean 4 lemmas (non-trivial structural proofs required)

Follows issue-to-pr conventions: creates worktree, works in isolation, submits PR.

Strict quality gates (zero tolerance):
- No "trivial" category — every reduction ≥5000 checks
- 7 mandatory Python sections including NO (infeasible) example
- Non-trivial Lean required (rfl/omega tautologies rejected)
- Zero hand-waving in Typst ("clearly", "obviously" → rejected)
- Mandatory gap analysis: every proof claim must have a test
- Self-review checklist with 20+ items across 4 categories

Developed and validated through PR #975 (800K+ checks, 3 bugs caught)
and tested on issues #868 (caught wrong example) and #841 (35K checks).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…kill

- Added frontmatter (name, description) matching other skills' convention
- Toned down aggressive language ("ZERO TOLERANCE", "THE HARSHEST STEP",
  "NON-NEGOTIABLE") to professional but firm language
- All quality gates unchanged — same strictness, better presentation

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces Lean-required gates with adversarial second agent:

- Step 5: Adversary agent independently implements reduce() and
  extract_solution() from theorem statement only (not constructor's script)
- Step 5c: Cross-comparison of both implementations on 1000 instances
- Lean downgraded from required to optional
- hypothesis property-based testing for n up to 50
- Quality gates: 2 independent scripts ≥5000 checks each + cross-comparison

Design rationale (docs/superpowers/specs/2026-04-01-adversarial-verification-design.md):
- Same agent writing proof + test is the #1 risk for AI verification
- Two independent implementations agreeing > one + trivial Lean
- Lean caught 0 bugs in PR #975; Python caught all 4

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Typst: Construction + Correctness (⟹ + ⟸) + Extraction + Overhead + YES/NO examples
Constructor: 41,107 checks, 0 failures (exhaustive n=3, sampled n=4,5, 7 sections)
Adversary: 19,832 checks, 0 failures (independent impl + 3 hypothesis strategies)
Cross-comparison: 900 instances, all agree, 0 feasibility mismatches

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 1, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 98.03%. Comparing base (423506c) to head (e489546).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #983   +/-   ##
=======================================
  Coverage   98.03%   98.03%           
=======================================
  Files         784      784           
  Lines       82310    82310           
=======================================
  Hits        80695    80695           
  Misses       1615     1615           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@zazabap zazabap closed this Apr 4, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Rule] NOT-ALL-EQUAL 3SAT to SET SPLITTING

2 participants