Skip to content

RFC: Strengthen /verify-reduction with algebraic/combinatorial and analysis techniques #1013

@zazabap

Description

@zazabap

Motivation

PR #992 verified 34 reductions with dual Python scripts (constructor + adversary, 443M+ checks). PR #1010 then rejected two of those (#368, #905) as "unsound" — but re-running PR #992's scripts confirms they pass. The problem: PR #1010 implemented a different construction, got it wrong, and blamed the reduction.

This exposes a gap in the current verification methodology: brute-force enumeration + random PBT can miss structural bugs, and two independent implementations can share blind spots.

This issue proposes adding algebraic, combinatorial, and analytic verification techniques to /verify-reduction to catch bug classes that the current approach cannot.

Current methodology (7 sections)

  1. Symbolic overhead (SymPy)
  2. Exhaustive forward/backward (n ≤ 5)
  3. Solution extraction
  4. Overhead formula spot-check
  5. Structural properties
  6. YES example
  7. NO example

Plus an independent adversary with hypothesis PBT.

Proposed enhancements

A. Algebraic/Combinatorial Techniques (discrete math)

Technique What it catches Applicable to
Invariant-based induction — verify a preserved quantity (e.g., independence number = m for satisfiable instances) symbolically for general n, not just small n Scaling bugs beyond n=5 Gadget reductions (12 rules), graph transforms (18 rules)
Solution counting — enumerate ALL solutions on both sides, verify counts match (bijection) or relate by known factor Shared blind spots between constructor/adversary; broken extraction that still passes ∃-checks Complement/negation (5), identity (8), algebraic (16)
Parity checks — verify odd/even solution count is preserved; extremely sensitive to off-by-one gadget errors Single misplaced edge in gadget construction NAE-SAT reductions, set splitting, any reduction with complementary solution pairs
Rank/dimension — for GF(2) / linear equation reductions, verify constraint matrix rank matches proof claims Accidentally linearly dependent rows that only matter at large n X3C→GF2 (#859), X3C→MinWeightLinEq (#860), any linear algebra reduction
Boundary instance generation — systematically test at exact feasibility thresholds (S=2T, minimum degree, single-clause) Off-by-one in case splits, edge cases in padding SubsetSum→Partition (#973), all padding/embedding reductions (4 rules)
Extremal graph testing — test on K_n, empty, cycle, star, Petersen where answers are known analytically, scale to n=50+ Construction bugs that only appear beyond brute-force range All 18 graph transformation reductions
Monotonicity — verify adding a constraint to source predictably changes target Coupling bugs between gadgets Gadget reductions, scheduling encodings (8 rules)

B. Real Analysis Techniques (novel, experimental)

Technique What it catches Applicable to
Continuity of reduction mapping — for weighted problems, perturb weights continuously and verify target changes continuously; discontinuities reveal bugs Construction bugs in weighted reductions that discrete testing misses MaxCut, TSP, weighted MIS, all weighted ILP reductions
IVT for objective value gaps — if source objective takes values {a,b}, target should take all corresponding values in [f(a),f(b)]; gaps indicate missing solution mappings Incomplete solution space coverage All Max/Min optimization reductions
Convergence of overhead ratios — verify overhead(n)/n converges as n→∞ using limit theorems Overhead formulas that are correct for small n but diverge All reductions with non-trivial overhead
Compactness arguments — LP relaxation of target ILP has solution by compactness; verify relaxation tightness ILP reductions where integrality gap matters 85 ILP reductions

C. SMT/SAT Bounded Verification

Encode "∀ instances of size ≤ n: source feasible ⟺ target feasible" as a Z3 formula and machine-check it. Stronger than enumeration because SMT reasons symbolically over all instances simultaneously.

What does NOT work

  • Lean 4 formal proofs: Tried, doesn't integrate well with this library's Python/Rust verification pipeline. Mathlib's real analysis coverage is extensive but the formalization overhead (~500 hrs per project) is prohibitive for 167 reductions.

Questions for discussion

  1. Which techniques have the best effort/impact ratio? The 85 ILP reductions are mechanical — should we focus algebraic techniques on the ~80 non-ILP reductions instead?

  2. Should analysis techniques (Section B) be pursued? Nobody is applying real analysis to reduction verification — it could be novel and publishable, but unclear how many reductions actually benefit. Most applicable to weighted/optimization reductions.

  3. Is SMT verification (Section C) worth the Z3 dependency? It's strictly stronger than brute force for bounded n, but adds complexity.

  4. Integration: Should new techniques be additional script sections (8-11), a separate third verification layer, or upgrades to existing sections?

  5. Priority: Given that the current pipeline already caught the [Rule] 3SAT to DIRECTED TWO-COMMODITY INTEGRAL FLOW #368/[Rule] 3-SATISFIABILITY to FEASIBLE REGISTER ASSIGNMENT #905 issue (PR docs: batch verify-reduction — 34 implementable reductions verified #992 was right, PR feat: add 11 medium-confidence reduction rules #1010 was wrong), is the main goal (a) preventing future false rejections, (b) catching real bugs at scale, or (c) building toward publishable verification methodology?

Reduction landscape (for context)

Category Count Most promising technique
ILP formulations 85 Compactness / LP relaxation
Graph transformation 18 Extremal graphs, invariants
Linear algebra / QUBO 16 Rank/dimension
Special structural 14 Case-by-case
Gadget construction 12 Parity, counting, monotonicity
Scheduling encoding 8 Boundary instances
Identity/relabeling 8 Counting (bijection proof)
Complement/negation 5 Parity, counting
Padding/embedding 4 Boundary instances

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions