Skip to content

about benchmark test pipeline #4

@Heimdall-Nss

Description

@Heimdall-Nss

Hi, thank you for your great work!
I'm very interested in your project and am trying to reproduce the results on the test set.
I noticed that the default run_geo.py seems to perform real-time Google search to obtain reference web pages. However, according to my understanding of your paper, shouldn't the evaluation be based on the web content provided in the source field of test.jsonl?
Additionally, could you please clarify the intended usage and difference between the sources and summaries parameters in the improve()?
Currently, it seems that only the summaries parameter is actually used for evaluation, and if it is not provided, the script will perform a new search instead of using the fixed data in test.jsonl. Is this the expected behavior?
Thank you very much for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions