Skip to content

[BUG] Fix data loader #2810

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from
Open

[BUG] Fix data loader #2810

wants to merge 3 commits into from

Conversation

TonyBagnall
Copy link
Contributor

@TonyBagnall TonyBagnall commented May 16, 2025

Fixes #2755

The data loader covers a lot of edge cases and Im sure could be completely redesigned with some benefit, but its not fun. Follows standard format of /_TRAIN.ts and /_TEST.ts

The problem comes when trying to load local data with

from aeon.datasets import load_classification
X, y = load_classification(name= "FOO", extract_path="C:\\Temp\\")

when C:\Temp\Foo exists, but does not contain what we are looking for it gets deleted. Note its fine to load a dataset with load_from_ts_file, this is just when using load_classification, because of some assumptions and too many alternatives. The basic logic is to look in a location for the requested dataset in a directory at location <extract_path>. It calls ```get_downloaded_tsc_tsr_datasets`` which returns a list of valid directories. To be valid, you need to have BOTH _train and _test
ts files. If they are not there, its considered incorrect.

The problem here arises when there is a local directory, but it does not contain train and test files. The function then tries to download the zip from tsc.com or zenodo using _download_and_extract. Here there is a case when it will create a directory if not present to put the zip in then attempt to unzip, since for legacy reasons, the zips do not internally contain a directory.

Anyway, long story short, this now only deletes the directory if it was not already present and had thus been created at the attempted download

@TonyBagnall TonyBagnall added the datasets Datasets and data loaders label May 16, 2025
@TonyBagnall TonyBagnall changed the title [ENH] Fix data loader [BUG] Fix data loader May 16, 2025
@aeon-actions-bot aeon-actions-bot bot added the bug Something isn't working label May 16, 2025
@aeon-actions-bot
Copy link
Contributor

Thank you for contributing to aeon

I have added the following labels to this PR based on the title: [ $\color{#d73a4a}{\textsf{bug}}$ ].
I would have added the following labels to this PR based on the changes made: [ $\color{#0B1D38}{\textsf{datasets}}$ ], however some package labels are already present.

The Checks tab will show the status of our automated tests. You can click on individual test runs in the tab or "Details" in the panel below to see more information if there is a failure.

If our pre-commit code quality check fails, any trivial fixes will automatically be pushed to your PR unless it is a draft.

Don't hesitate to ask questions on the aeon Slack channel if you have any.

PR CI actions

These checkboxes will add labels to enable/disable CI functionality for this PR. This may not take effect immediately, and a new commit may be required to run the new configuration.

  • Run pre-commit checks for all files
  • Run mypy typecheck tests
  • Run all pytest tests and configurations
  • Run all notebook example tests
  • Run numba-disabled codecov tests
  • Stop automatic pre-commit fixes (always disabled for drafts)
  • Disable numba cache loading
  • Push an empty commit to re-run CI checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets Datasets and data loaders full pytest actions Run the full pytest suite on a PR
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] load_classification incorrectly deletes files and folders
1 participant