Skip to content

The necessity of introducing a calculator for more precise validation. #8

@lrlbbzl

Description

@lrlbbzl

Has the author considered adding a calculator to achieve more accurate verification of results? We have found that GPT-4o can easily make errors when judging numerically equivalent results, which may also be related to the requirement for complete consistency in the prompt. I understand that maintaining consistent evaluation is authoritative and credible, but it could be more precise. For example, when the correct answer is 4+4\sqrt{3}, 10.9 would be judged as incorrect.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions