Skip to content

I cannot reproduce the results in the paper. #8

@MuchiBai

Description

@MuchiBai

I followed the commands in the README to conduct training on the AlfWorld dataset and obtained poor test results.

This is the training accuracy and training logs:

Image
alfworld_train_true.txt

These are the insights extracted from the training logs:

alfworld_insight.txt

And here are the test accuracy and test logs:

Image
alfworld_eval_true.txt

All the results are far from what was reported in the paper. I don't know where the problem lies. Maybe it's because I replaced the model with GPT-4o?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions