Hello SAM-3D!
Thank you for releasing this amazing work.
I would like to ask the subset details of Aria Digital Twin evaluation reported in the paper.
In Appendix D.2, the paper says:
Aria Digital Twin (Pan et al., 2023): We sample a smaller set of 40 video frames, with around 30 objects per scene.
Could you share the details of the video indices and the frame selection scheme to collect 40 video frames? This would significantly help follow-up research. Thank you so much !