[Example] Apply AI review to Erdos problem 7 by ryantuck · Pull Request #1 · ryantuck/formal-conjectures

ryantuck · 2026-03-09T17:23:27Z

Per discussion here google-deepmind#3422

Example of a way to conduct a further pass at tightening up the formalized conjectures prior to review.

Tell claude to read REVIEW_MATH.md and produce a review file of what else is needed.
Tell claude to make whatever fixes are necessary.

franzhusch · 2026-03-09T19:46:30Z

+3. Variants - Are all variants of the problem captured by the formalization?
+4. Readability - Could the code be made more readable?
+5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement.
+6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself?


I think a good keyword is faithful here, correct could be misinterpreted as "Does the Lean compile?". So something along the lines, "Is the formalization faithful to the Natural Language Statement from the source material?".

Fun lesson from all this - when I was experimenting with method, I had tried to tell Claude Haiku to work on a batch of 80 problems with a single prompt, and it produced 80 lean files with trivial implementations, ran a successful lake build, and called it a day. Amazing watching the models figure out shortcuts and ignore otherwise important details. I'll certainly avoid suggesting that they maximize paperclips, haha.

Yep, thats why we need AI Safety and Human Reviews :D

franzhusch · 2026-03-09T19:48:42Z

+3. Variants - Are all variants of the problem captured by the formalization?
+4. Readability - Could the code be made more readable?
+5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement.
+6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself?


Other point, which one could test is that he should write it in the docstring, if a statement had a bit of ambiguity and what way he chose to solve that ambiguity. We might have to be wary that it doesnt get out of hand, but one could experiment with it.

franzhusch · 2026-03-09T19:50:05Z

+
+1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well.
+2. Citations - Fetch data from https://www.erdosproblems.com/NUM to ensure any citations included in docstrings are documented as they exist on the website as opposed to shorthand references.
+3. Variants - Are all variants of the problem captured by the formalization?


"Are also all variants of the problem formalized", might be better as he might misunderstand the other sentence, in the sense that he should find one formalized statement which is general enough to encompass all variants or so.

franzhusch · 2026-03-09T19:51:36Z

+
+The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md.
+
+1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well.


Maybe also add that if a definition is general enough and / or used across many problems, then he can also move that definition to the FormalConjecturesForMathlib Folder for future usage.

Agree with this suggestion, though I think I'll decouple it into a separate step like, "review all files, identify common implementations of definitions, and abstract them out"

franzhusch · 2026-03-09T19:57:51Z

@@ -0,0 +1,12 @@
+# Review Math


What about adding a additional step asking for sensible category test statements? Those might help increasing the confidence that the formalization are correct. But this can also be done later and for now are not necessarily needed.

I had to google "lean category test statements", looks like there would be some way within lean to more rigorously confirm that the code was implemented faithfully? If so, agree, is a good idea and can be implemented independently, will look into further.

Oh sorry, I should have specified what I meant by [category test] is that we have a custom categories in Formal Conjectures. Here is for example a case, where category test statements are used for the Dedekind Number Definitions.

They can increase reviewability by giving further evidence of the correctness / faithfulness of the mathematicial definitions. If sensible of course, not every definition can be tested sensible.

franzhusch

I left some comments. The draft REVIEW_MATH.md is a good first start, and we can also see that a lot was corrected for the file, showing that a additional stage is fruitful.

apply ai review to erdos 7

d9f875e

github-actions bot added the erdos-problems label Mar 9, 2026

franzhusch reviewed Mar 9, 2026

View reviewed changes

ryantuck mentioned this pull request Mar 10, 2026

feat(ErdosProblems): All 1179 Erdős conjectures formalized google-deepmind/formal-conjectures#3422

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Example] Apply AI review to Erdos problem 7#1

[Example] Apply AI review to Erdos problem 7#1
ryantuck wants to merge 1 commit intoerdos-allfrom
ai-erdos-7

ryantuck commented Mar 9, 2026

Uh oh!

franzhusch Mar 9, 2026

Uh oh!

ryantuck Mar 10, 2026

Uh oh!

franzhusch Mar 10, 2026 •

edited

Loading

Uh oh!

franzhusch Mar 9, 2026

Uh oh!

franzhusch Mar 9, 2026

Uh oh!

franzhusch Mar 9, 2026

Uh oh!

ryantuck Mar 10, 2026

Uh oh!

franzhusch Mar 9, 2026

Uh oh!

ryantuck Mar 10, 2026

Uh oh!

franzhusch Mar 10, 2026

Uh oh!

franzhusch left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants


		The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md.

		1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well.

Conversation

ryantuck commented Mar 9, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franzhusch Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

franzhusch left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

franzhusch Mar 10, 2026 •

edited

Loading