Skip to content

Is there a lightweight way to access a small subset of MagicData for baseline evaluation? #14

@lvxiangyi

Description

@lvxiangyi

Hi, thank you very much for releasing MagicData.

I am currently trying to use MagicData for baseline empirical validation with pre-trained models. My goal is not to train a new model on the full dataset, but to sample a small subset that is still close to the original data distribution.

However, I found it difficult to access a small number of videos efficiently. From MagicData.csv, I can identify the target samples through fields such as video_path and videoid, but it seems that the actual videos are only provided through the split videos.zip.part_* files. As a result, even extracting a small subset may require reconstructing and downloading a very large archive first.

I would like to ask:

Is there any lightweight way to obtain a small representative subset of MagicData for evaluation?
Do you have a smaller sample release, benchmark split, or recommended subset for reproduction / baseline testing?

I think such an option would be very helpful for users who want to perform small-scale validation or failure analysis without requiring the full storage cost.

Thank you for your time and help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions