Is there a lightweight way to access a small subset of MagicData for baseline evaluation?

Hi, thank you very much for releasing MagicData.

I am currently trying to use MagicData for baseline empirical validation with pre-trained models. My goal is not to train a new model on the full dataset, but to sample a small subset that is still close to the original data distribution.

However, I found it difficult to access a small number of videos efficiently. From MagicData.csv, I can identify the target samples through fields such as video_path and videoid, but it seems that the actual videos are only provided through the split videos.zip.part_* files. As a result, even extracting a small subset may require reconstructing and downloading a very large archive first.

I would like to ask:

Is there any lightweight way to obtain a small representative subset of MagicData for evaluation?
Do you have a smaller sample release, benchmark split, or recommended subset for reproduction / baseline testing?

I think such an option would be very helpful for users who want to perform small-scale validation or failure analysis without requiring the full storage cost.

Thank you for your time and help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is there a lightweight way to access a small subset of MagicData for baseline evaluation? #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Is there a lightweight way to access a small subset of MagicData for baseline evaluation? #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions