Skip to content

proofs for lemma distInterleavedCodeToCodeLB, e_leq_dist_over_3 and probOfBadPts#297

Open
FawadHa1der wants to merge 3 commits intoVerified-zkEVM:mainfrom
FawadHa1der:InterleavedCodes
Open

proofs for lemma distInterleavedCodeToCodeLB, e_leq_dist_over_3 and probOfBadPts#297
FawadHa1der wants to merge 3 commits intoVerified-zkEVM:mainfrom
FawadHa1der:InterleavedCodes

Conversation

@FawadHa1der
Copy link

probOfBadPts added some extra assumptions and e_leq_dist_over_3 has been proved with a weaker and stronger assumption. I can delete the weaker one if required. Created with help of Codex and Claude.

@github-actions
Copy link

github-actions bot commented Jan 26, 2026

🤖 Gemini PR Summary

This Pull Request formalizes several critical mathematical components related to the AHIV22 paper, specifically focusing on the proximity gap for interleaved Reed-Solomon codes. The changes provide the necessary proofs to establish lower bounds on row-span distance and analyze the probability of failure points in proximity gaps.

Features

  • Formalization of AHIV22 Results: Implemented the core proximity gap results for interleaved Reed-Solomon codes within ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean.
  • New Mathematical Lemmas:
    • distInterleavedCodeToCodeLB: Proves the lower bound for the distance between an interleaved code and the target code.
    • probOfBadPts: Establishes the probability bounds for "bad points" within the proximity gap context, incorporating additional necessary assumptions for the proof.
    • e_leq_dist_over_3: Formalizes the error bound $e \leq d/3$. This has been provided with both "weaker" and "stronger" assumption variants to offer flexibility in how the theorem is applied.
  • Row-Span & Affine-Line Analysis: Added proofs for the row-span distance lower bound and the affine-line proximity gap, which are essential for verifying the soundness of these proximity proofs.

Refactoring

  • Assumption Management: The proof for e_leq_dist_over_3 currently contains two versions (weaker vs. stronger assumptions). These are preserved for review to determine which version best fits the project's long-term requirements.

Documentation

  • Note: While no standalone documentation files were added, the formalization itself serves as a machine-verifiable specification of the AHIV22 paper’s claims regarding interleaved Reed-Solomon codes.

Engineer’s Note: This PR was developed with the assistance of AI tools (Codex and Claude). The author has requested a review specifically on whether to retain both versions of the e_leq_dist_over_3 proof or to prune the weaker variant.


Analysis of Changes

Metric Count
📝 Files Changed 1
Lines Added 1988
Lines Removed 39

sorry Tracking

✅ **Removed:** 3 `sorry`(s)
  • lemma probOfBadPts {deg : ℕ} {α : ι ↪ F} {e : ℕ} {U_star : WordStack (A in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
  • lemma distInterleavedCodeToCodeLB in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean
  • lemma e_leq_dist_over_3_strong in ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean

🎨 **Style Guide Adherence**

Based on the provided style guide, the following lines in the code changes violate the specified guidelines:

  • Line 44: variable {F : Type} [Field F] [Finite F] [DecidableEq F] {κ : Type*} [Fintype κ] {ι : Type} [Fintype ι]
    • Violation: "Variable Conventions: α, β, γ, ... : Generic types" (The code uses F, κ, and ι as types).
  • Line 159: obtain ⟨D, hD_card, hD_zero⟩ := exists_common_support_of_wt_le (F := F) (ι := ι) (E := LinearMap.range err) (e := e) herr_wt hF
    • Violation: "Line Length: Keep lines under 100 characters."
  • Lines 113, 118, 122, 137, 143, 212, 222, 228, 230, 240, 247, 256, 262, 266, 273, 279, 285, 290, 294, 301, 311, 314, 319, 329, 334, 340, 350, 362, 381, 387, 396, 401, 411, 418, 437, 442, 452, 461, 484, 501, 511, 521, 528, 545, 554, 563, 569, 584, 601, 611, 621, 628, 645, 654, 663, 669, 684, 691, 698, 712, 725, 736, 743, 755, 765, 774, 781, 792, 803, 814, 825, 836, 847, 858, 869, 881, 891, 903, 911, 921, 931, 941, 951, 961, 971, 981, 991: (Multiple empty lines within proof blocks)
    • Violation: "Empty Lines: Avoid empty lines inside definitions or proofs."
  • Lines 52, 101, 106, 113, 151, 159, 163, 175, 185, 201, 225, 245, 252, 258, 269, 275, 281, 298, 303, 313, 316, 321, 331, 350, 362, 381, 387, 396, 401, 411, 418, 437, 442, 452, 461, 484, 501, 511, 521, 528, 545, 554, 563, 569, 584, 601, 611, 621, 628, 645, 654, 663, 669, 684, 691, 698, 712, 725, 736, 743, 755, 765, 774, 781, 792, 803, 814, 825, 836, 847, 858, 869, 881, 891, 903, 911, 921, 931, 941, 951, 961, 971, 981, 991: fun x => ...
    • Violation: "Functions: Prefer fun x ↦ ... over λ x, ...."
  • Line 200: lemma distInterleavedCodeToCodeLB
    • Violation: "Theorems and Proofs: snake_case (e.g., add_comm, list_reverse_id)."
    • Violation: "Acronyms: Treat as words (e.g., HtmlParser not HTMLParser)." (Regarding LB).
  • Line 206, 207, 208, 215, 219, etc.: hF, he, hU, h3e_lt_d, h3pos, h', h3ne, h3mul, h3mul', hmul, h2e_lt_d, h_contra, h_close, h_exists_codeword, hdec_mem, hdec_dist, hdec_add
    • Violation: "Variable Conventions: h, h₁, ... : Assumptions/Hypotheses" (The guide mandates specific generic letters/subscripts rather than descriptive camelCase).
  • Line 225, 411, 414, 461: ...)).1 / ...)).2
    • Violation: "Use manual dot notation for logical connectives and equality (e.g., And.intro, Eq.refl, Iff.mp)."
  • Line 346: def numberOfClosePts
    • Violation: "Acronyms: Treat as words (e.g., HtmlParser not HTMLParser)." (The abbreviation Pts should be treated as the word Points).
  • Line 367: lemma e_leq_dist_over_3
    • Violation: "Symbol Naming Dictionary: | | le |" (The code uses leq instead of le).
  • Line 377: lemma e_leq_dist_over_3_strong
    • Violation: "Symbol Naming Dictionary: | | le |"
  • Line 549: lemma dirClose_of_manyClosePts
    • Violation: "Theorems and Proofs: snake_case (e.g., add_comm, list_reverse_id)."
    • Violation: "Acronyms: Treat as words (e.g., HtmlParser not HTMLParser)."
  • Line 698: lemma probOfBadPts
    • Violation: "Theorems and Proofs: snake_case (e.g., add_comm, list_reverse_id)."

📄 **Per-File Summaries**
  • ArkLib/Data/CodingTheory/ProximityGap/AHIV22.lean: Formalizes key results from the AHIV22 paper regarding the proximity gap for interleaved Reed-Solomon codes, including proofs for the row-span distance lower bound and the affine-line proximity gap.

Last updated: 2026-02-20 19:10 UTC.

@FawadHa1der
Copy link
Author

I was trying to implement probOfBadPts but the AI suggested we prove the other distInterleavedCodeToCodeLB and e_leq_dist_over_3 first. Didn't mean to duplicate or interfere already in progress work. My apologies if that inadvertently happen. @erdkocak @DimitriosMitsios I know this is a collaborative effort and I should sensitive to someone elses efforts

@alexanderlhicks tagging you so that you can check the assumptions and the solution.

@erdkocak
Copy link
Contributor

No problem, thanks for the great work. It seems that AI used a different approach than following the paper via using the already proven results from DG25, which I believe shows as library gets bigger proving new stuff will be much easier.

@chung-thai-nguyen
Copy link
Collaborator

@FawadHa1der Thanks for the good work. At a quick glance, the code looks good and clean, the hF & hdeg are reasonable/implicit assumptions of RS code (& derived from he). Lemma 4.4 is using OR instead of XOR but seems like good enough to use in Lemma 4.5 and it's not used anywhere else. Existing lemmas & definitions are applied fluently which is a good point. Though we still need deeper review. As a side question, which subscription plans for Codex or Claude did you use for this PR, and did it require much time for you to produce this PR?

@FawadHa1der
Copy link
Author

FawadHa1der commented Jan 27, 2026

I have a little unusual setup. I had Claude (4.5 opus) and Codex 5.2 running at the same time. Though Codex has been performing much better I have to admit, I have to look into how to make make Claude better at lean. It did stop multiple times so may be but not completely sure I think it spent about 3-5 hours. These things are running in the background as I get up to speed with my Lean learnings

@chung-thai-nguyen
Copy link
Collaborator

I have a little unusual setup. I had Claude (4.5 opus) and Codex 5.2 running at the same time. Though Codex has been performing much better I have to admit, I have to look into how to make make Claude better at lean. It did stop multiple times so may be but not completely sure I think it spent about 3-5 hours. These things are running in the background as I get up to speed with my Lean learnings

That’s impressive tbh and it's a nice showcase of AI capabilities in FV, especially given that this was done alongside you ramping up on Lean in a short duration. Thanks for the insights and the PR.

@DimitriosMitsios
Copy link
Contributor

@FawadHa1der Did you get the usual monthly subscriptions on those models or the api charge? Have you tried any of the DeepSeek's models as well? They have lower api cost by more than 10x but I don't have any experience with their performance on FV. I could give them a try myself and report back!

@FawadHa1der
Copy link
Author

FawadHa1der commented Jan 30, 2026

@DimitriosMitsios what is FV? yes just the monthly subscription. They do get rate limited sometimes which is annoying. I ave not tried DeepSeek yet. Never heard of its math/Lean4 capabilities though

@chung-thai-nguyen thank you, I have been mindful of trying not to add assumptions to the statements because I suspect they have been designed to keep re-usability in mind. After an initial implementation by AI I go back try to reduce or take out the assumptions that were added, which is not always successful as you might notice if my next PR :)

@DimitriosMitsios
Copy link
Contributor

@FawadHa1der FV := Formal Verification, it was used in a previous message and I took it for granted, sorry!

@alexanderlhicks alexanderlhicks self-assigned this Feb 7, 2026
@alexanderlhicks
Copy link
Collaborator

Hello! Thanks for the PR and sorry for the delay in getting to this. I'll be merging the 4.26 PR soon, which might break a few things, and get back to all the other PRs including this one.

`(𝔽ᵐ)ⁿ`. Let `e` be a positive integer such that `e < d/3` and `|𝔽| ≥ e`.
Suppose `d(U⋆, L^⋈m) > e`. Then, there exists `v⋆ ∈ L⋆` such that `d(v⋆, L) > e`, where `L⋆` is the
row-span of `U⋆`. -/
private def vecSupport (u : ι → F) : Finset ι :=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for these to be marked as private? Some of these also look like they might or should already exist in a standard library.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so this can be done
private def vecSupport (u : ι → F) : Finset ι := by classical exact (Function.support u).toFinset
But we do need finset, other parts of the code are dependent on it.

private def vecSupport (u : ι → F) : Finset ι :=
Finset.filter (fun j => u j ≠ 0) Finset.univ

private lemma mem_vecSupport {u : ι → F} {j : ι} : j ∈ vecSupport (F := F) u ↔ u j ≠ 0 := by
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from them being private a few of these could do with some cleaning up regading assumptions as e.g. F does not need to be finite here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment as above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants