Hi, thank you very much for releasing MagicData.
I am currently trying to use MagicData for baseline empirical validation with pre-trained models. My goal is not to train a new model on the full dataset, but to sample a small subset that is still close to the original data distribution.
However, I found it difficult to access a small number of videos efficiently. From MagicData.csv, I can identify the target samples through fields such as video_path and videoid, but it seems that the actual videos are only provided through the split videos.zip.part_* files. As a result, even extracting a small subset may require reconstructing and downloading a very large archive first.
I would like to ask:
Is there any lightweight way to obtain a small representative subset of MagicData for evaluation?
Do you have a smaller sample release, benchmark split, or recommended subset for reproduction / baseline testing?
I think such an option would be very helpful for users who want to perform small-scale validation or failure analysis without requiring the full storage cost.
Thank you for your time and help.
Hi, thank you very much for releasing MagicData.
I am currently trying to use MagicData for baseline empirical validation with pre-trained models. My goal is not to train a new model on the full dataset, but to sample a small subset that is still close to the original data distribution.
However, I found it difficult to access a small number of videos efficiently. From MagicData.csv, I can identify the target samples through fields such as video_path and videoid, but it seems that the actual videos are only provided through the split videos.zip.part_* files. As a result, even extracting a small subset may require reconstructing and downloading a very large archive first.
I would like to ask:
Is there any lightweight way to obtain a small representative subset of MagicData for evaluation?
Do you have a smaller sample release, benchmark split, or recommended subset for reproduction / baseline testing?
I think such an option would be very helpful for users who want to perform small-scale validation or failure analysis without requiring the full storage cost.
Thank you for your time and help.