A reproducible machine learning pipeline for time series classification on the UCR Time Series Archive.
├── LICENSE <- Open-source license if one is chosen
├── Makefile <- Makefile with convenience commands like `make data` or `make train`
├── README.md <- The top-level README for developers using this project.
├── data
│ ├── external <- Data from third party source - UCRBenchmark_2018.zip and its metadata file
│ ├── interim <- Intermediate data that has been transformed.
│ ├── processed <- The final, canonical data sets for modeling.
| │ ├── raw_splits <- fixed labels and dropped nans in original train/test split from unzipped folders
| │ ├── kan <- tarin/validation/test split
| │ └── mlp <- train/test split
│ ├── results <- Results produced after training and evaluation.
│ └── raw <- Original, immutable data dump.
│
├── docs <- A default mkdocs project; see www.mkdocs.org for details
│
├── models <- Trained and serialized models
│ ├── inceptiontime <- Saved InceptionTime models
│ ├── kan <- Saved Kolmogorov–Arnold Network models
│ └── mlp <- Saved MLP models
│
│
├── pyproject.toml <- Project configuration file with package metadata for
│ ucr_benchmark_template and configuration for tools like black
│
├── references <- Data dictionaries, manuals, and all other explanatory materials.
│
├── reports <- Generated analysis as HTML, PDF, LaTeX, etc.
│ └── figures <- Generated graphics and figures to be used in reporting
│
├── requirements.txt <- The requirements file for reproducing the analysis environment, e.g.
│ generated with `pip freeze > requirements.txt`
│
├── setup.cfg <- Configuration file for flake8
│
├── dvc.yaml <- DVC pipeline definition file
│
├── params.yaml <- Parameters for preprocessing and training
│
└── ucr_benchmark_template. <- Source code for use in this project.
│
├── __init__.py <- Makes ucr_benchmark_template a Python module
│
├── __main__.py <- Optional main execution entry point
│
├── config.py <- Shared configuration and paths
│
├── dataset.py <- Functions to load or prepare datasets
│
├── analyze_results.py <- Analyze and summarize results across models
│
├── save_results.py <- Save metrics and metadata into CSV logs
│
├── preprocessing
| |
│ ├── preprocessors
| | ├── __init__.py
| | ├── mlp.py <- Preprocesses data for MLP (train/test split)
| | └── kan.py <- Preprocesses data for KAN (train/val/test split)
| |
│ ├── utils
| | └── fix_labels.py <- Normalize class labels to start from 0
| |
│ └── preprocess.py <- Pipeline to preprocess selected datasets
│
├── modeling
│ ├── __init__.py
│ ├── train.py <- Train, predict, evaluate MLP
│ ├── train_kan.py <- Train, predict, evaluate KAN
│ └── train_inceptiontime.py <- Train, predict, evaluate InceptionTime
│
└── plots.py <- Code to create visualizations
This section explains how to plug a new training script (like train_kan.py
) into the DVC pipeline.
# For exact reproducibility (recommended)
pip install -r requirements-lock.txt
# For development (if you need newer versions)
pip install -r requirements.txt
Place UCRArchive_2018.zip
in:
data/external/UCRArchive_2018.zip
Put your training code under:
ucr_benchmark_template/modeling/train_<yourmodel>.py
The script should:
- Load datasets from
data/processed/<type>/
- Accept parameters from
params.yaml
- Train model for all hyperparameter combinations
- Save results by calling
save_run_results
with a dictionary containing model name, dataset, evaluation metrics, training time, and all hyperparameters - Save trained model to
models/<yourmodel>/
- Loads
data/processed/kan
- Builds KAN with
make_model(...)
- Evaluates using accuracy, F1, precision, recall
- Stores model in
models/kan/
- Saves results in
data/results/all_results.csv
train_kan:
depths: [2]
hidden_layers: [10]
learning_rates: [0.001]
epochs: [500]
batch: 16
grid: [5]
spline_order: 3
train_kan:
cmd: python -m ucr_benchmark_template.modeling.train_kan
deps:
- data/processed/
- ucr_benchmark_template/modeling/train_kan.py
- params.yaml
outs:
- models/kan/
params:
- train_kan.depths
- train_kan.hidden_layers
- train_kan.learning_rates
- train_kan.epochs
- train_kan.batch
- train_kan.grid
- train_kan.spline_order
Once your models are integrated, here's how to run and track experiments.
# Run all stages from data preprocessing to results analysis
dvc repro
# Check what stages will run
dvc status
# Run specific training stage
dvc repro train_mlp
dvc repro train_kan
dvc repro train_inceptiontime
# Run preprocessing only
dvc repro preprocess_data
# Run results analysis
dvc repro process_results
# Run experiment with custom name
dvc exp run -n "baseline_experiment"
# Run specific stage as experiment
dvc exp run -n "mlp_test" train_mlp
# Modify MLP parameters
dvc exp run -n "mlp_lr_01" --set-param train_mlp.learning_rates="[0.01]"
# Modify KAN architecture
dvc exp run -n "kan_depth_3" --set-param train_kan.depths="[3]"
# Change datasets to test on
dvc exp run -n "small_test" --set-param preprocess.datasets="[Coffee, Beef, OliveOil]"
# Train only specific models
dvc exp run -n "mlp_only" --set-param preprocess.models="[mlp]"
# Queue several experiments
dvc exp run --queue -n "mlp_lr_001" --set-param train_mlp.learning_rates="[0.001]"
dvc exp run --queue -n "mlp_lr_01" --set-param train_mlp.learning_rates="[0.01]"
dvc exp run --queue -n "mlp_lr_1" --set-param train_mlp.learning_rates="[0.1]"
# Run all queued experiments
dvc exp run --run-all
# List all experiments
dvc exp show
# Show only changed parameters and metrics
dvc exp show --only-changed
# Show specific experiments
dvc exp show baseline_experiment mlp_lr_01
- unzip_data: unzips
UCRArchive_2018.zip
todata/raw/
(skipped if already extracted) - preprocess_data creates:
data/processed/raw_splits
: fixed labels (unified class 0) from raw UCRdata/processed/mlp
: train/test splitsdata/processed/kan
: train/val/test splits
- train_*: model training stages consume processed data and store results/models
- process_results: analyzes all results and creates summary files
Purpose | Path |
---|---|
Raw UCR data | data/external/UCRArchive_2018.zip |
Preprocessed | data/processed/ |
Saved models | models/<model_name>/ |
Run results | data/results/all_results.csv |
Params | params.yaml |
Training code | ucr_benchmark_template/modeling/ |