Skip to content

Unable to reproduce results of ASearcher-Web-QwQ on GAIA. #22

@ligang-cs

Description

@ligang-cs

Thank you for your excellent work. I have encountered some difficulties reproducing the results of ASearcher-Web-QwQ on GAIA. I am only able to achieve 41.7% accuracy, which is significantly lower than the 52.8% reported in the paper.
Here is my eval script:

MODEL_PATH=ASearcher-Web-QwQ_inclusionAI
DATA_NAMES=GAIA
AGENT_TYPE=asearcher-reasoning
PROMPT_TYPE=asearcher-reasoning
SEARCH_CLIENT_TYPE=async-web-search-access

python3 search_eval_async.py \
    --data_names ${DATA_NAMES} \
    --model_name_or_path ${MODEL_PATH}  \
    --output_dir results \
    --data_dir ${DATA_DIR} \
    --prompt_type $PROMPT_TYPE \
    --agent-type ${AGENT_TYPE} \
    --search-client-type ${SEARCH_CLIENT_TYPE} \
    --tensor_parallel_size 4 \
    --temperature 0.6 \
    --parallel-mode seed \
    --seed 1 \
    --aggregate-only \
    --use-jina \
    --llm_as_judge \
    --use-openai True \
    --pass-at-k 1 

evaluation results:

Image

JINA and SERPER APIs are correctly set. Thank you for your help!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions