[Example] Apply AI review to Erdos problem 7#1
Conversation
| 3. Variants - Are all variants of the problem captured by the formalization? | ||
| 4. Readability - Could the code be made more readable? | ||
| 5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement. | ||
| 6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself? |
There was a problem hiding this comment.
I think a good keyword is faithful here, correct could be misinterpreted as "Does the Lean compile?". So something along the lines, "Is the formalization faithful to the Natural Language Statement from the source material?".
There was a problem hiding this comment.
Fun lesson from all this - when I was experimenting with method, I had tried to tell Claude Haiku to work on a batch of 80 problems with a single prompt, and it produced 80 lean files with trivial implementations, ran a successful lake build, and called it a day. Amazing watching the models figure out shortcuts and ignore otherwise important details. I'll certainly avoid suggesting that they maximize paperclips, haha.
There was a problem hiding this comment.
Yep, thats why we need AI Safety and Human Reviews :D
| 3. Variants - Are all variants of the problem captured by the formalization? | ||
| 4. Readability - Could the code be made more readable? | ||
| 5. Formalizability - Is the problem as stated precise enough to be obviously formalizable? Provide an assessment of the ambiguity of the statement. | ||
| 6. Correctness - Is the formalization as-implemented correct and complete from a mathematical perspective? Would an experienced mathematician identify any obvious flaws? Is any incompleteness or incorrectness attributable to ambiguity in the statement itself? |
There was a problem hiding this comment.
Other point, which one could test is that he should write it in the docstring, if a statement had a bit of ambiguity and what way he chose to solve that ambiguity. We might have to be wary that it doesnt get out of hand, but one could experiment with it.
|
|
||
| 1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well. | ||
| 2. Citations - Fetch data from https://www.erdosproblems.com/NUM to ensure any citations included in docstrings are documented as they exist on the website as opposed to shorthand references. | ||
| 3. Variants - Are all variants of the problem captured by the formalization? |
There was a problem hiding this comment.
"Are also all variants of the problem formalized", might be better as he might misunderstand the other sentence, in the sense that he should find one formalized statement which is general enough to encompass all variants or so.
|
|
||
| The goal is to review a particular Erdos Problem formalization (problem number NUM) to answer the following questions. Produce a review document called ai-review/NUM.md. | ||
|
|
||
| 1. Code reuse - Can any code from the existing codebase be repurposed? Look in FormalConjecturesForMathlib to determine if an existing implementation would work just as well. |
There was a problem hiding this comment.
Maybe also add that if a definition is general enough and / or used across many problems, then he can also move that definition to the FormalConjecturesForMathlib Folder for future usage.
There was a problem hiding this comment.
Agree with this suggestion, though I think I'll decouple it into a separate step like, "review all files, identify common implementations of definitions, and abstract them out"
| @@ -0,0 +1,12 @@ | |||
| # Review Math | |||
There was a problem hiding this comment.
What about adding a additional step asking for sensible category test statements? Those might help increasing the confidence that the formalization are correct. But this can also be done later and for now are not necessarily needed.
There was a problem hiding this comment.
I had to google "lean category test statements", looks like there would be some way within lean to more rigorously confirm that the code was implemented faithfully? If so, agree, is a good idea and can be implemented independently, will look into further.
There was a problem hiding this comment.
Oh sorry, I should have specified what I meant by [category test] is that we have a custom categories in Formal Conjectures. Here is for example a case, where category test statements are used for the Dedekind Number Definitions.
They can increase reviewability by giving further evidence of the correctness / faithfulness of the mathematicial definitions. If sensible of course, not every definition can be tested sensible.
franzhusch
left a comment
There was a problem hiding this comment.
I left some comments. The draft REVIEW_MATH.md is a good first start, and we can also see that a lot was corrected for the file, showing that a additional stage is fruitful.
Per discussion here google-deepmind#3422
Example of a way to conduct a further pass at tightening up the formalized conjectures prior to review.
REVIEW_MATH.mdand produce a review file of what else is needed.