The necessity of introducing a calculator for more precise validation.

Has the author considered adding a calculator to achieve more accurate verification of results? We have found that GPT-4o can easily make errors when judging numerically equivalent results, which may also be related to the requirement for complete consistency in the prompt. I understand that maintaining consistent evaluation is authoritative and credible, but it could be more precise. For example, when the correct answer is 4+4\sqrt{3}, 10.9 would be judged as incorrect.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The necessity of introducing a calculator for more precise validation. #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

The necessity of introducing a calculator for more precise validation. #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions