Skip to content

[BUG] load_classification incorrectly deletes files and folders #2755

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TonyBagnall opened this issue Apr 18, 2025 · 0 comments · May be fixed by #2810
Open

[BUG] load_classification incorrectly deletes files and folders #2755

TonyBagnall opened this issue Apr 18, 2025 · 0 comments · May be fixed by #2810
Assignees
Labels
bug Something isn't working datasets Datasets and data loaders

Comments

@TonyBagnall
Copy link
Contributor

TonyBagnall commented Apr 18, 2025

Describe the bug

when a path is given to load data, and that data is incorrectly formatted, load_classification deletes the file and directory. It should not do this!

Steps/Code to reproduce the bug

from aeon.datasets import load_classification
X, y = load_classification("GunPoint", extract_path="C:\\Temp\\")

this works and loads the data as expected from directory C:\Temp\GunPoint. However, if a dataset is incorrectly formatted, it deletes the directory.

from aeon.datasets import load_classification
X, y = load_classification("NBack_combined", extract_path="C:\\Temp\\")

raises this error after deleting the directory
File "C:\Code\aeon\aeon\datasets_data_loaders.py", line 1382, in load_classification
raise ValueError(error_str)
ValueError: Invalid dataset name =Test that is not available on extract path =C:\Temp. Nor is it available on https://timeseriesclassification.com/ or zenodo.

For this file, the error is having one channel a different length to the others, but Ive not tested it with other errors. This

from aeon.datasets import load_classification, load_from_ts_file
X, y = load_from_ts_file("C:\\Temp\\NBack_combined")
print(X.shape)

Expected results

load_classification behaves the same as load_from_ts_file

Actual results

throws the (correct) error

 File "C:\Code\aeon\aeon\datasets\_data_loaders.py", line 282, in load_from_ts_file
    data, y, meta_data = _load_data(file, meta_data)
  File "C:\Code\aeon\aeon\datasets\_data_loaders.py", line 215, in _load_data
    raise OSError(
OSError: channel 0 in case 11 has a different number of observations to the other channels. Saw 135 in the first channel but 136 in the channel 0. The meta data specifies equal length == True. But even if series length are unequal, all channels for a single case must be the same length

Versions

No response

@TonyBagnall TonyBagnall added bug Something isn't working datasets Datasets and data loaders labels Apr 18, 2025
@TonyBagnall TonyBagnall self-assigned this Apr 18, 2025
@TonyBagnall TonyBagnall linked a pull request May 16, 2025 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working datasets Datasets and data loaders
Projects
None yet
1 participant