Allow datasets to be loaded from the Hugging Face Hub

Hello, currently all the logic to load datasets is hard-coded in `pipelinerl/domains/math/load_datasets.py`. It would be nice if one could specify the train and test datasets to point directly to ones on the Hugging Face Hub, like this:

```yaml
train_dataset_names:
    - {ORG}/{TRAIN_DATASET_NAME}
test_dataset_names:
    - {ORG}/{TEST_DATASET_NAME}
```

As long as the datasets are preprocessed in the expected format, we could skip the hard-coded logic in `load_datasets.py` altogether and make the framework a lot more flexible / user friendly :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow datasets to be loaded from the Hugging Face Hub #95

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Allow datasets to be loaded from the Hugging Face Hub #95

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions