Skip to content

Potential fairness issue in ReXTime evaluation #18

@LiGuo12

Description

@LiGuo12

Thank you for your excellent work and for releasing the code.

I noticed a possible inconsistency in the evaluation protocol for ReXTime.

In Table 3 of your paper, the reported results appear to be evaluated on the validation set. However, the prior SOTA results cited in the same table, from paper REXTIME: A Benchmark Suite for Reasoning-Across-Time in Videos, are reported on the test set.

Since the validation and test sets may differ in distribution and difficulty, directly comparing results across these two splits may not be fully fair or reliable.

Could you please clarify the following points?

  1. Are your reported results in Table 3 indeed evaluated on the ReXTime validation set?
  2. Given that the ReXTime test set is not publicly available, how should readers interpret the comparison between your validation-set results and prior test-set results?
  3. Are there any other works that also report results on the ReXTime validation set for reference?

Thank you for your clarification.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions