Potential fairness issue in ReXTime evaluation

Thank you for your excellent work and for releasing the code. 

I noticed a possible inconsistency in the evaluation protocol for ReXTime.

In Table 3 of your paper, the reported results appear to be evaluated on the validation set. However, the prior SOTA results cited in the same table, from paper REXTIME: A Benchmark Suite for Reasoning-Across-Time in Videos, are reported on the test set.

Since the validation and test sets may differ in distribution and difficulty, directly comparing results across these two splits may not be fully fair or reliable.

Could you please clarify the following points?

1. Are your reported results in Table 3 indeed evaluated on the ReXTime validation set?
2. Given that the ReXTime test set is not publicly available, how should readers interpret the comparison between your validation-set results and prior test-set results?
3. Are there any other works that also report results on the ReXTime validation set for reference?

Thank you for your clarification.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential fairness issue in ReXTime evaluation #18

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Potential fairness issue in ReXTime evaluation #18

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions