- python 3.8.0
- torch 2.0.1
- torchvision 0.15.2
- numpy 1.24.4
- scikit-learn 1.3.0
All datasets (ECG, Seismic, SALD, DEEP and METR-LA) are publicly available online. Please refer to the links provided to access them. Then, process the dataset into an array of time series in .npy format and place them in the data/ directory. Due to the large size of the datasets, we provide an example of a small sampled dataset in the data/ directory.
This repo uses SEAnet to generate time series embeddings. Please refer to the link provided to generate the embeddings. Process the embeddings into an array of time series embeddings in .npy format (similar to dataset processing) and place them in the emb/ directory.
data/: A directory that stores time series for each dataset.dist/: A directory that stores distance matrix each dataset.emb/: A directory that stores time series embeddings for each dataset.models/: A directory that stores trained models.simset/: A directory that stores precomputed similar sets.
compute_dist.py: A file that computes distance matrix for a dataset.max_dist.json: A file that stores the maximum pairwise distance for each dataset.compute_simset.py: A file that computes similar set for a dataset.greedy.py: The implementation of PreGreedy, PreGreedyET, Greedy and GreedyET representative time series selection methods. This file reads the input dataset, selection method, normalized distance threshold and coverage threshold, and outputs the representative time series selected.mlp.py: A file that defines the neural network.mlgreedyet_train.py: A file that generates the training data for training the neural network.mlgreedyet_test.py: The implementation of MLGreedyET representative time series selection method. This file reads the input dataset, normalized distance threshold and coverage threshold, and outputs the representative time series selected.
For representative time series selection without similar sets precomputation (Greedy and GreedyET), please refer to steps 1 and 3. For representative time series selection with similar sets precomputation (PreGreedy and PreGreedyET), please refer to steps 1, 2 and 3. Note that for each dataset, steps 1 and 2 are required to run only once initially.
- Compute distance matrix:
python compute_dist.py --dataset <dataset_filename> - Compute similar set:
python compute_simset.py --dataset <dataset_filename> --tau <normalized_distance_threshold> - Select representative time series:
python greedy.py --dataset <dataset_filename> --method <selection_method> --tau <normalized_distance_threshold> --beta <coverage_threshold>
For representative time series selection using learning approach (MLGreedyET), please refer to each step below.
- Train the model:
python mlgreedyet_train.py --dataset <dataset_filename> --tau <training_normalized_distance_threshold> - Select representative time series:
python mlgreedyet_test.py --dataset <dataset_filename> --tau <normalized_distance_threshold> --beta <coverage_threshold>