-
Notifications
You must be signed in to change notification settings - Fork 56
Open
Description
I'm trying to reproduce the results of Table 2 in the paper. I ran the code in the repo on the TRANCOS dataset and got the following results after 1000 epochs:
validation_mae = 3.80
test_mae = 3.55
The paper reports a test MAE of 3.32. Is the difference of 0.2 MAE reasonable given different seeds, different initialization, etc?
And/or is there some way to seed the code so it gets 3.32 precisely?
Thanks!
Edit: Looking at the output some more, I see that the best valid MAE was 3.36 after 902 epochs. Are the numbers in Table 2 reporting validation or test performance?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels