Skip to content

[Example] Apply AI review to Erdos problem 7#1

Open
ryantuck wants to merge 1 commit intoerdos-allfrom
ai-erdos-7
Open

[Example] Apply AI review to Erdos problem 7#1
ryantuck wants to merge 1 commit intoerdos-allfrom
ai-erdos-7

Conversation

@ryantuck
Copy link
Copy Markdown
Owner

@ryantuck ryantuck commented Mar 9, 2026

Per discussion here google-deepmind#3422

Example of a way to conduct a further pass at tightening up the formalized conjectures prior to review.

  1. Tell claude to read REVIEW_MATH.md and produce a review file of what else is needed.
  2. Tell claude to make whatever fixes are necessary.

Comment thread ai-review/REVIEW_MATH.md
3. Variants - Are all variants of the problem captured by the formalization?
4. Readability - Could the code be made more readable?
5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement.
6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a good keyword is faithful here, correct could be misinterpreted as "Does the Lean compile?". So something along the lines, "Is the formalization faithful to the Natural Language Statement from the source material?".

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fun lesson from all this - when I was experimenting with method, I had tried to tell Claude Haiku to work on a batch of 80 problems with a single prompt, and it produced 80 lean files with trivial implementations, ran a successful lake build, and called it a day. Amazing watching the models figure out shortcuts and ignore otherwise important details. I'll certainly avoid suggesting that they maximize paperclips, haha.

Copy link
Copy Markdown

@franzhusch franzhusch Mar 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thats why we need AI Safety and Human Reviews :D

Comment thread ai-review/REVIEW_MATH.md
3. Variants - Are all variants of the problem captured by the formalization?
4. Readability - Could the code be made more readable?
5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement.
6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other point, which one could test is that he should write it in the docstring, if a statement had a bit of ambiguity and what way he chose to solve that ambiguity. We might have to be wary that it doesnt get out of hand, but one could experiment with it.

Comment thread ai-review/REVIEW_MATH.md

1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well.
2. Citations - Fetch data from https://www.erdosproblems.com/NUM to ensure any citations included in docstrings are documented as they exist on the website as opposed to shorthand references.
3. Variants - Are all variants of the problem captured by the formalization?
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Are also all variants of the problem formalized", might be better as he might misunderstand the other sentence, in the sense that he should find one formalized statement which is general enough to encompass all variants or so.

Comment thread ai-review/REVIEW_MATH.md

The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md.

1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe also add that if a definition is general enough and / or used across many problems, then he can also move that definition to the FormalConjecturesForMathlib Folder for future usage.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree with this suggestion, though I think I'll decouple it into a separate step like, "review all files, identify common implementations of definitions, and abstract them out"

Comment thread ai-review/REVIEW_MATH.md
@@ -0,0 +1,12 @@
# Review Math
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about adding a additional step asking for sensible category test statements? Those might help increasing the confidence that the formalization are correct. But this can also be done later and for now are not necessarily needed.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to google "lean category test statements", looks like there would be some way within lean to more rigorously confirm that the code was implemented faithfully? If so, agree, is a good idea and can be implemented independently, will look into further.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh sorry, I should have specified what I meant by [category test] is that we have a custom categories in Formal Conjectures. Here is for example a case, where category test statements are used for the Dedekind Number Definitions.

They can increase reviewability by giving further evidence of the correctness / faithfulness of the mathematicial definitions. If sensible of course, not every definition can be tested sensible.

Copy link
Copy Markdown

@franzhusch franzhusch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments. The draft REVIEW_MATH.md is a good first start, and we can also see that a lot was corrected for the file, showing that a additional stage is fruitful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants