Skip to content

Allow datasets to be loaded from the Hugging Face Hub #95

@lewtun

Description

@lewtun

Hello, currently all the logic to load datasets is hard-coded in pipelinerl/domains/math/load_datasets.py. It would be nice if one could specify the train and test datasets to point directly to ones on the Hugging Face Hub, like this:

train_dataset_names:
    - {ORG}/{TRAIN_DATASET_NAME}
test_dataset_names:
    - {ORG}/{TEST_DATASET_NAME}

As long as the datasets are preprocessed in the expected format, we could skip the hard-coded logic in load_datasets.py altogether and make the framework a lot more flexible / user friendly :)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions