Skip to content

Guruswami-Sudan existence#299

Closed
ster-oc wants to merge 7 commits intoVerified-zkEVM:mainfrom
ster-oc:guruswami-sudan-exist
Closed

Guruswami-Sudan existence#299
ster-oc wants to merge 7 commits intoVerified-zkEVM:mainfrom
ster-oc:guruswami-sudan-exist

Conversation

@ster-oc
Copy link

@ster-oc ster-oc commented Jan 31, 2026

This PR proves the existence of a polynomial Q fulfilling the properties of Guruswami-Sudan deconding algorithm.

I believe that, as the Conditions structure is written, guruswami_sudan_for_proximity_gap_existence should assume m ≠ 0 and the case m = 0 should be treated differently with Q = 1 being the natural candidate.

The code is long but in the next few days I will work to golf it!
In a following PR, with the help of @Aristotle-Harmonic, I will also prove guruswami_sudan_for_proximity_gap_property.

Closes : #221

Co-authored-by: Aristotle (Harmonic) <aristotle-harmonic@harmonic.fun>
@github-actions
Copy link

github-actions bot commented Jan 31, 2026

🤖 Gemini PR Summary

This PR advances the library's coding theory capabilities by formalizing the existence proof of the interpolation polynomial $Q$ required for the Guruswami-Sudan list-decoding algorithm. It establishes the mathematical foundation for the interpolation step, handling weighted degree and multiplicity constraints.

Features

  • Guruswami-Sudan Existence Proof: Implemented the core existence theorem for the bivariate polynomial $Q$ that satisfies the required multiplicity at given points under specific weighted degree constraints.
  • Bivariate Polynomial Utilities: Added a shift function to ArkLib/Data/Polynomial/Bivariate.lean to allow the translation of bivariate polynomials by a point $(x, y)$, a necessary component for proving properties at specific coordinates.
  • Library Integration: Exposed the Guruswami-Sudan modules in the top-level ArkLib imports for public use.

Fixes

  • Edge Case Handling: Addressed the structure of the Conditions type to better handle cases where $m \neq 0$, suggesting $Q=1$ as the candidate for the $m=0$ case.
  • Type Consistency: Updated Proximity Gap lemmas (BCIKS20) to align with the renamed and refactored Conditions structure, ensuring compatibility across the library.

Refactoring

  • Modularization: Refactored the Guruswami-Sudan implementation by extracting auxiliary definitions into a dedicated Basic module.
  • Namespace & Type Improvements: Renamed and reorganized the Conditions structure to standardize how constraints are passed between the interpolation step and proximity gap proofs.

Documentation

  • Note: While no standalone documentation files were added, this PR includes formal proofs and comments regarding the interpolation step of the list-decoding algorithm.

Analysis of Changes

Metric Count
📝 Files Changed 5
Lines Added 673
Lines Removed 59

sorry Tracking

✅ **Removed:** 1 `sorry`(s)
  • lemma guruswami_sudan_for_proximity_gap_existence {k m : ℕ} {ωs : Fin n ↪ F} {f : Fin n → F} : in ArkLib/Data/CodingTheory/GuruswamiSudan/GuruswamiSudan.lean
✏️ **Affected:** 1 `sorry`(s) (line number changed)

🎨 **Style Guide Adherence**

The following lines violate the provided style guide:

ArkLib/Data/CodingTheory/GuruswamiSudan/Basic.lean

  • Line 1: "Module Docstrings: Each file should start with a /-! ... -/ block containing a title, summary, notation, and references."
  • Lines 30, 40, 164: "Each file that cites papers should have a ## References section in its docstring header" (Cites [BCIKS20]).
  • Line 31: noncomputable def proximity_gap_degree_bound : ℕ := — "Functions and Terms: lowerCamelCase (e.g., binarySearch, isPrime)."
  • Line 41: noncomputable def proximity_gap_johnson : ℕ := — "Functions and Terms: lowerCamelCase (e.g., binarySearch, isPrime)."
  • Line 54: def weigthBoundIndices (k D : ℕ) : Finset (ℕ × ℕ) := — "Every definition and major theorem should have a docstring."
  • Line 54: def weigthBoundIndices — "Spelling: Use American English for declaration names (e.g., analyze not analyse, color not colour)." (Misspelled "weight").
  • Line 58: def numVars (k D : ℕ) : ℕ := (weigthBoundIndices k D).card — "Every definition and major theorem should have a docstring."
  • Line 61: lemma card_weigthBoundIndices_eq_sum (D : ℕ) (hk : 1 < k) : — "Variable Conventions: h, h₁, ... : Assumptions/Hypotheses" (Uses hk).
  • Line 84: lemma numVars_eq_of_gt_one {D : ℕ} (hk : 1 < k) : — "Variable Conventions: h, h₁, ... : Assumptions/Hypotheses" (Uses hk).
  • Line 118: lemma numVars_lower_bound_tight {D : ℕ} (hk : 1 < k) : — "Variable Conventions: h, h₁, ... : Assumptions/Hypotheses" (Uses hk).
  • Line 119: 2 * (k - 1) * numVars k D ≥ D * (D + 2) := by — "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
  • Line 135: def constraintIndices (m : ℕ) : Finset (ℕ × ℕ) := — "Every definition and major theorem should have a docstring."
  • Line 143: def numConstraints (n m : ℕ) : ℕ := n * (constraintIndices m).card — "Every definition and major theorem should have a docstring."
  • Line 175: lemma proximity_gap_degree_bound_sq_gt — "Avoid (ge) and > (gt) in theorem statements... standardize on (le) and < (lt)." (Lemma name uses gt).
  • Line 176: ((proximity_gap_degree_bound k n m : ℝ) + 1) ^ 2 > — "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
  • Line 189: numVars_gt_numConstraints_of_gt_one — "Avoid (ge) and > (gt) in theorem statements... standardize on (le) and < (lt)." (Lemma name uses gt).
  • Line 189: (hn : n ≠ 0) (hk : 1 < k) (hm : 1 ≤ m) — "Variable Conventions: h, h₁, ... : Assumptions/Hypotheses" (Uses hn, hk, hm).
  • Line 190: numVars k (proximity_gap_degree_bound k n m) > numConstraints n m := by — "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
  • Line 194: 2 * (k - 1) * numVars k D ≥ (D : ℝ) * (D + 2) := by — "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
  • Line 203: norm_cast at *; — "Avoid using ; to separate tactics unless writing short, single-line tactic sequences."
  • Line 207: lemma numVars_gt_numConstraints — "Avoid (ge) and > (gt) in theorem statements... standardize on (le) and < (lt)." (Lemma name uses gt).
  • Line 208: numVars k (proximity_gap_degree_bound k n m) > numConstraints n m := by — "Avoid (ge) and > (gt) in theorem statements unless necessary for argument ordering."
  • Line 245: noncomputable def constraintMap (D : ℕ) : — "Every definition and major theorem should have a docstring."

ArkLib/Data/Polynomial/Bivariate.lean

  • Line 441: /-- Shift a bivariate polynomial by (x, y). -/ above noncomputable def shift — "Declaration Docstrings: Use /-- ... -/ above definitions." (The code uses -- instead of /-- for the docstring).

📄 **Per-File Summaries**
  • ArkLib.lean: Added the Guruswami-Sudan basic module to the top-level library imports.
  • ArkLib/Data/CodingTheory/GuruswamiSudan/Basic.lean: This file implements the interpolation step of the Guruswami-Sudan list-decoding algorithm by proving the existence and properties of a non-zero bivariate polynomial satisfying specific weighted degree and multiplicity constraints.
  • ArkLib/Data/CodingTheory/GuruswamiSudan/GuruswamiSudan.lean: This change refactors the Guruswami–Sudan decoder by moving auxiliary definitions to a separate module and updating the Conditions structure and existence theorem to use these shared definitions.
  • ArkLib/Data/CodingTheory/ProximityGap/BCIKS20.lean: Update the Guruswami-Sudan proximity gap lemmas to use the renamed Conditions type.
  • ArkLib/Data/Polynomial/Bivariate.lean: Added a shift function to translate a bivariate polynomial by a given point $(x, y)$.

Last updated: 2026-02-14 10:06 UTC.

@alexanderlhicks alexanderlhicks self-assigned this Feb 5, 2026
@alexanderlhicks
Copy link
Collaborator

Thanks for the PR!

  • Would it be possible to consider organising the supporting lemmas in this PR into other files so that GuruswamiSudan.lean can stay focused on the main result? This may also keep the file shorter once the sorry that remains in the main theorem is resolved.
  • Is there any progress towards the remaining sorry?

@ster-oc
Copy link
Author

ster-oc commented Feb 5, 2026

I am working at moving relevant theory into ./GuruswamiSudan/Basic.lean (not sure it works as a name, alternatives are Prelim.lean or Prereq.lean let me know).

For what concerns the second bullet point, in the last few days I was trying to prove the divisibility property of PR #300 and turns out that many forced constraints are not set in GuruswamiSudan.lean.
Examples include :

  • [Fintype F]: the field must be finite
  • n ≤ card F : the blocklength is at most the cardinality of the field (say q)
  • k + 1 ≤ n : if you trasmit less than the dimension of the code then you have no way to solve the system
  • 1 ≤ m : asking for m = 0 means to ask for "zeros with multiplicity 0" so the bivariate polynomial does not even have to vanish at the desired domain points and the Guruswami-Sudan decoding cannot happen (not with the usual Conditions).

I do believe that all this hypotheses have to be included at least in the main theorem as a sanity check for the decoding to be meaningful. In this way, the only sorry disappears because of the last item in the list. I am working on it to include all this and change as little as possible from the existing .lean

@alexanderlhicks
Copy link
Collaborator

Hello! Thanks for the PR and sorry for the delay in getting to this. I'll be merging the 4.26 PR soon, which might break a few things, and get back to all the other PRs including this one.

@alexanderlhicks
Copy link
Collaborator

/review

External:
https://cse.buffalo.edu/faculty/atri/courses/coding-theory/book/web-coding-book.pdf
https://eprint.iacr.org/2020/654.pdf
https://eprint.iacr.org/2025/2055.pdf

Internal:
ArkLib/Data/CodingTheory/
ArkLib/Data/Polynomial

Comments:
Please review for correctness with respect to ArkLib internals and the external reference papers.

@github-actions
Copy link

🤖 AI Review (with external context)

🤖 AI Review

Overall Summary:
The core Guruswami-Sudan formalization is incorrect due to a critical off-by-one error in the weighted degree definition (using k-1 instead of k) that invalidates the algorithm's root-finding guarantees. Additionally, the main theorem statements define the distance metric incorrectly and lack essential hypotheses regarding the interpolation polynomial and codeword degree. While the auxiliary polynomial utilities and API renames are valid, these fundamental logic errors must be resolved before merging.


📄 **Review for `ArkLib.lean`**

Verdict: Correct

The change is a straightforward addition of an import statement to the top-level library file.

Analysis:

  1. Implementation: The addition of import ArkLib.Data.CodingTheory.GuruswamiSudan.Basic creates a dependency on a new module. Given the surrounding imports (BerlekampWelch, GuruswamiSudan, InterleavedCode), this aligns with the mathematical content provided in the global context, which covers Reed-Solomon decoding algorithms including Guruswami-Sudan.
  2. Correctness: This file acts as an aggregator for the library. Adding the import allows users importing ArkLib to access the definitions and theorems in ArkLib.Data.CodingTheory.GuruswamiSudan.Basic. Since the prompt confirms the code compiles, the module exists and is compatible.

No misformalization issues are present in this file.

📄 **Review for `ArkLib/Data/CodingTheory/GuruswamiSudan/Basic.lean`**

Verdict: Incorrect

Critical Issues:

  1. Incorrect Weighted Degree Definition: The definition of weigthBoundIndices (and associated lemmas like natWeightedDegree_coeffsToPoly_le) uses k - 1 as the weight for the $Y$ variable:
    filter (fun x ↦ x.1 + (k - 1) * x.2 ≤ D)
    According to the variable documentation, k + 1 is the dimension of the code, which implies the message polynomials have degree at most k. In the Guruswami-Sudan algorithm for Reed-Solomon codes defined by evaluations of polynomials of degree $\le k$, the interpolation polynomial $Q(X, Y)$ requires a weighted degree bound with respect to weights $(1, k)$. Using k - 1 relaxes the constraint on $Q$ (allowing higher degrees in $Y$), which makes the existence proof easier (numVars is larger) but invalidates the root-finding step guarantees (as $\deg(Q(X, P(X)))$ may exceed the number of roots enforced).
    This error permeates through numVars calculations and polySol_weightedDegree_le.

Minor Issues:

  1. Typo in Name: weigthBoundIndices should be weightBoundIndices.
  2. numVars Logic for k=0: While the logic technically holds, if k=0, k-1 is 0 (due to natural number subtraction). This leads to i <= D and j being constrained only by the range (D+1) in the product, effectively j <= D. While likely harmless for existence proofs, it reflects the semantic error of using k-1 instead of k.

Corrected Code Snippets:

Update weightBoundIndices (and rename) to use weight k:

/-- Given a nonnegative integer `D`, it is the set of indices `(i,j)` such
    that `i + k * j ≤ D`. -/
def weightBoundIndices (k D : ℕ) : Finset (ℕ × ℕ) :=
  (range (D + 1)).product (range (D + 1)) |>.filter (fun x ↦ x.1 + k * x.2 ≤ D)

You will need to update card_weigthBoundIndices_eq_sum (dividing by k instead of k-1), numVars_eq_of_gt_one (condition 0 < k), and numVars_lower_bound_tight accordingly. The lower bound 2(k-1) * numVars >= D(D+2) might need adjustment to 2k * numVars >= ... or similar based on the new count. Ensure polySol_weightedDegree_le asserts weightedDegree ... 1 k ≤ ....

📄 **Review for `ArkLib/Data/CodingTheory/GuruswamiSudan/GuruswamiSudan.lean`**

The formalization in ArkLib/Data/CodingTheory/GuruswamiSudan/GuruswamiSudan.lean is Incorrect.

There are critical logic errors and missing hypotheses in the definitions and theorem statements.

Critical Issues:

  1. Off-by-One Error in Weighted Degree:
    In the Conditions structure, the field Q_deg is defined as:
    Q_deg : weightedDegree Q 1 (k - 1) ≤ D
    The variable documentation states Let k + 1 be the **dimension** of the code, which implies message polynomials have degree at most k. The Guruswami-Sudan algorithm requires the weight of the $Y$ variable to match the maximum degree of the message polynomials to ensure $\deg(Q(X, P(X))) \le D$. Therefore, the weight should be k, not k - 1.
    Reference: [BCI+20], Section 5.1: "The (1, k)-weighted degree of a polynomial..."

  2. Incorrect Definition of Distance:
    In guruswami_sudan_for_proximity_gap_property, the distance hypothesis is:
    Δ₀(f, (codewordToPoly p).eval ∘ f) ≤ ...
    This composes the polynomial evaluation with the received word f (values), rather than the evaluation domain ωs (points). It should be:
    Δ₀(f, (codewordToPoly p).eval ∘ ωs) ≤ ...

  3. Missing Hypotheses in Property Theorem:
    The theorem guruswami_sudan_for_proximity_gap_property asserts that Y - P(X) divides Q, but it takes Q as an arbitrary polynomial without requiring that Q actually satisfies the Conditions (i.e., that Q interpolates the data points).
    Additionally, the theorem allows p to be any element of code ωs n. If n is the blocklength, this type typically allows polynomials of degree up to $n-1$. The theorem must restrict p (or codewordToPoly p) to have degree $\le k$ for the divisibility property to hold under the given degree bound $D$.

Corrected Code Snippets:

For Conditions:

structure Conditions (D : ℕ) (ωs : Fin n ↪ F) (f : Fin n → F) (Q : F[X][Y]) where
  -- ...
  /-- (1, k)-weighted degree of the polynomial is bounded. -/
  Q_deg : weightedDegree Q 1 k ≤ D -- Changed k-1 to k
  -- ...

For guruswami_sudan_for_proximity_gap_property:

lemma guruswami_sudan_for_proximity_gap_property {k m : ℕ} {ωs : Fin n ↪ F}
  {f : Fin n → F}
  {Q : F[X][X]}
  {p : code ωs (k + 1)} -- Enforce correct dimension/degree
  (h_Q : Conditions k m (proximity_gap_degree_bound k n m) ωs f Q) -- Require Q to satisfy conditions
  (h_dist : Δ₀(f, (codewordToPoly p).eval ∘ ωs) ≤ proximity_gap_johnson (n := n) k m) -- Correct composition
  :
  ((X : F[X][X]) - C (codewordToPoly p)) ∣ Q := by sorry
📄 **Review for `ArkLib/Data/CodingTheory/ProximityGap/BCIKS20.lean`**

Analysis:

The diff provided for ArkLib/Data/CodingTheory/ProximityGap/BCIKS20.lean shows a simple renaming of Condition to Conditions in the statements of two lemmas: guruswami_sudan_for_proximity_gap_existence and guruswami_sudan_for_proximity_gap_property.

  1. Context: The file corresponds to the results from [BCI+20] (Ben-Sasson et al., "Proximity Gaps for Reed–Solomon Codes"). Specifically, these lemmas relate to the existence and properties of the interpolation polynomial $Q$ in the Guruswami-Sudan list decoding algorithm as adapted for proving proximity gaps.
  2. The Change: The renaming Condition -> Conditions suggests a refactoring where the constraints on the polynomial $Q$ (degree bounds, root multiplicities, non-zero status) are grouped under the name Conditions (plural). This is a common naming convention when multiple predicates are bundled together.
  3. Correctness:
    • Parameters: The parameter k + 1 is passed to Conditions. In the provided code snippet, the rate $\rho$ is calculated as ((k + 1 : ℚ) / n). In Reed-Solomon codes, if polynomials have degree $\le k$, the dimension is $k+1$. The [BCI+20] paper uses $k$ for the degree bound (and thus dimension $k+1$). The usage here (k as degree, k+1 as dimension) appears consistent with the rate calculation. Assuming Conditions expects the code dimension (or consistently handles the k vs k+1 distinction internally), the call is correct.
    • Existence: The existence lemma asserts ∃ Q, Conditions .... This aligns with the solvability of the interpolation linear system (Claim 5.4 in [BCI+20] / Lemma 3.1 in [BCHKS25]).
    • Property: The property lemma asserts that if Conditions are met and a polynomial p is close to w, then Y - p(X) divides Q. This aligns with the factorization step (Lemma 5.3 in [BCI+20]).

Conclusion:

The changes represent a consistent renaming to match a likely update in the definition of the Conditions predicate/structure. The mathematical parameters passed (k+1 for dimension/bound, m for multiplicity, etc.) align with the theoretical results in the referenced papers.

Verdict:

The formalization in this file is correct.

📄 **Review for `ArkLib/Data/Polynomial/Bivariate.lean`**

File: ArkLib/Data/Polynomial/Bivariate.lean

Verdict: Correct

Analysis:

  1. Logic and Implementation: The function shift is intended to compute $f(X+x, Y+y)$.

    • The term X + C (C y) inside f.comp (...) represents the polynomial $Y + y$ in the nested polynomial ring F[X][Y] (where X refers to the outer generator $Y$, and C (C y) lifts the scalar $y$ to the outer ring). f.comp substitutes $Y$ with $Y+y$.
    • The term (X + C x).compRingHom creates a ring homomorphism on F[X] that maps a polynomial $P(X)$ to $P(X+x)$.
    • map applies this homomorphism to the coefficients of the polynomial resulting from the first step. Since the coefficients are polynomials in $X$, this substitutes $X$ with $X+x$.
    • The combined effect is $f(X+x, Y+y)$. The logic is correct.
  2. Type Correctness:

    • The code leverages Lean's type inference to resolve X to either Polynomial.X (in F[X]) or Polynomial.X (in F[X][Y]) depending on the context (+ C x vs + C (C y)). This is valid usage of the Polynomial algebra hierarchy.
    • F is declared as a Type with a Field instance, which is appropriate.
    • noncomputable is correctly applied given the Field context.
  3. Dependencies: The code assumes the existence of a helper compRingHom (likely wrapping Polynomial.comp as a RingHom) and the notation F[X][Y]. Assuming these are consistent with the rest of ArkLib, the definitions are correct.

Conclusion:
The added code correctly formalizes the bivariate polynomial shift operation required by the context of the external documents (e.g., for Taylor expansions or Hasse derivatives in algebraic coding theory algorithms). No errors found.

@quangvdao
Copy link
Collaborator

Hi @ster-oc ! Thanks for doing the proofs for Guruswami-Sudan.

We have a PR that will bump ArkLib to the latest Lean version #313 - once that's merged can you update your branch and get checks fully passing? I will take a look after that

@ster-oc
Copy link
Author

ster-oc commented Feb 24, 2026

Hi @quangvdao, sure I will fix after the bump!
Notice that the most updated PR now is #300, which contains a complete proof

@quangvdao
Copy link
Collaborator

@ster-oc in that case can you close this one?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proof obligation for guruswami_sudan_for_proximity_gap_existence in ArkLib/Data/CodingTheory/GuruswamiSudan/GuruswamiSudan.lean

3 participants