about benchmark test pipeline

Hi, thank you for your great work!
I'm very interested in your project and am trying to reproduce the results on the test set.
I noticed that the default run_geo.py seems to perform real-time Google search to obtain reference web pages. However, according to my understanding of your paper, shouldn't the evaluation be based on the web content provided in the source field of test.jsonl?
Additionally, could you please clarify the intended usage and difference between the **sources** and **summaries** parameters in the improve()?
Currently, it seems that only the summaries parameter is actually used for evaluation, and if it is not provided, the script will perform a new search instead of using the fixed data in test.jsonl. Is this the expected behavior?
Thank you very much for your help!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

about benchmark test pipeline #4

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

about benchmark test pipeline #4

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions