Main Recent Update:
- [Feb. 3, 2025] Paper accepted to ACM SIGMOD! The full paper can be found at paper link.
If you find our work helpful, please consider citing and giving a star :-)
"SPARTAN: Data-Adaptive Symbolic Time-Series Approximation" ACM SIGMOD 2025.
@inproceedings{yang2025spartan,
title={SPARTAN: Data-Adaptive Symbolic Time-Series Approximation},
author={Yang, Fan and Paparrizos, John},
booktitle={Proceedings of the 2025 ACM SIGMOD international conference on management of data},
year={2025}
}
- 📄 Overview
- ⚙️ Get Started
- 🏄♂️ Dive into Symbolic Representation Benchmark Study
- ✉️ Contact
- 🎉 Acknowledgement
Symbolic approximations are dimensionality reduction techniques that convert time series into sequences of discrete symbols, enhancing interpretability while reducing computational and storage costs. To construct symbolic representations, first numeric representations approximate and capture properties of raw time series, followed by a discretization step that converts these numeric dimensions into symbols. Despite decades of development, existing approaches have several key limitations that often result in unsatisfactory performance: they (i) rely on data-agnostic numeric approximations, disregarding intrinsic properties of the time series; (ii) decompose dimensions into equal-sized subspaces, assuming independence among dimensions; and (iii) allocate a uniform encoding budget for discretizing each dimension or subspace, assuming balanced importance. To address these shortcomings, we propose SPARTAN, a novel data-adaptive symbolic approximation method that intelligently allocates the encoding budget according to the importance of the constructed uncorrelated dimensions. Specifically, SPARTAN (i) leverages intrinsic dimensionality reduction properties to derive non-overlapping, uncorrelated latent dimensions; (ii) adaptively distributes the budget based on the importance of each dimension by solving a constrained optimization problem; and (iii) prevents false dismissals in similarity search by ensuring a lower bound on the true distance in the original space. To demonstrate SPARTAN’s robustness, we conduct the most comprehensive study to date, comparing SPARTAN with seven state-of-the-art symbolic methods across four tasks: classification, clustering, indexing, and anomaly detection. Rigorous statistical analysis across hundreds of datasets shows that SPARTAN outperforms competing methods significantly on all tasks in terms of downstream accuracy, given the same budget. Notably, SPARTAN achieves up to a 2x speedup compared to the most accurate rival. Overall, SPARTAN effectively improves the symbolic representation quality without storage or runtime overheads, paving the way for future advancements.
To install SPARTAN you will need to follow
Step 1: Install the required dependencies
# create your environment, and then
pip install -r requirements.txt
Step 2: Preparing the datasets for time-series analytical tasks:
- Download the UCR Archive for classification, clustering, and the tightness of lower bound (TLB) tasks.
- Download the TSB-UAD Archive for time-series anomaly detection.
import numpy as np
from TSB_Symbolic.symbolic import SPARTAN
# random data
data = np.random.rand(100,50)
# initialize SPARTAN with alphabet_size=4 and word_length=8
model = SPARTAN(
alphabet_size=4,
word_length = 8
)
# fit and transform the data
spartan_repr = model.fit_transform(data) # shape: (100, 8)
print("SPARTAN Transformed Data: \n", spartan_repr[:5])
# map it to readable symbols (using 'abcd' as an example)
mapping = {0: 'a', 1: 'b', 2: 'c', 3: 'd'}
symbol_map = np.vectorize(lambda x: mapping[int(x)])
spartan_repr = symbol_map(spartan_repr)
print("Map to readable symbols: \n", spartan_repr[:5])
Here, we present code examples for four downstream tasks. Config examples can be find in benchmark/configs
.
To test SPARTAN classification accuracy on a single dataset:
python -m benchmark.eval_classfication --classifier spartan --data /path/to/your/data --problem DatasetName --config ./path/to/model_params/
To test SPARTAN accuracy on all UCR datasets:
python main.py --eval_task classification --classifier spartan --data /path/to/your/data --config ./path/to/model_params/
To test SPARTAN clustering performance on a single dataset:
python -m benchmark.eval_clustering --classifier spartan --data /path/to/your/data --problem DatasetName --config ./path/to/model_params/
To test SPARTAN performance on all UCR datasets:
python main.py --eval_task clustering --classifier spartan --data /path/to/your/data --config ./path/to/model_params/
To test SPARTAN anomaly detection performance on a single time series:
python -m benchmark.eval_anomaly --data /path/to/your/data --classifier spartan --problem FileName --config ./path/to/model_params/
To test SPARTAN performance on all time series:
python main.py --eval_task anomaly --classifier spartan --data /path/to/your/data --config ./path/to/model_params/
To test the tlb performance (SPARTAN, SAX, and SFA) on a single UCR dataset:
python -m benchmark.eval_tlb --data /path/to/your/data --problem DatasetName -x DatasetID --alpha_max MaxAlphabetSize --alpha_min MinAlphabetSize --wordlen_max MaxWordLen --wordlen_min MinWordLen
To test the tlb performance (SPARTAN, SAX, and SFA) on all UCR datasets:
python main.py --eval_task tlb --data /path/to/your/data --alpha_max MaxAlphabetSize --alpha_min MinAlphabetSize --wordlen_max MaxWordLen --wordlen_min MinWordLen
If you have any questions or suggestions, feel free to contact:
- Fan Yang (yang.7007@osu.edu)
- John Paparrizos (paparrizos.1@osu.edu)
Or describe it in Issues.
We would like to acknowledge Ryan DeMilt (demilt.4@osu.edu) for the valuable contributions to this work. Also appreciate the following github repos a lot for their valuable code base: