Create and evaluate synthetic time series datasets effortlessly
Get Started • Tutorials • Augmentations • Generators • Metrics • Datasets • Contributing • Citing
TSGM is an open-source framework for synthetic time series dataset generation and evaluation.
The framework can be used for creating synthetic datasets (see 🔨 Generators ), augmenting time series data (see 🎨 Augmentations ), evaluating synthetic data with respect to consistency, privacy, downstream performance, and more (see 📈 Metrics ), using common time series datasets (TSGM provides easy access to more than 140 datasets, see 💾 Datasets ).
We provide:
- Documentation with a complete overview of the implemented methods,
- Tutorials that describe practical use-cases of the framework.
TSGM now supports Keras 3 with multiple backend options. Choose one of the following installation methods:
pip install tsgm[tensorflow]pip install tsgm[torch]pip install tsgm[jax]pip install tsgm[all]pip install tsgm
# Then install your preferred backend:
# For TensorFlow: pip install tensorflow tensorflow-probability
# For PyTorch: pip install torch torchvision
# For JAX: pip install jax jaxlibSet your preferred Keras backend using the environment variable:
# For TensorFlow backend
export KERAS_BACKEND=tensorflow
# For PyTorch backend
export KERAS_BACKEND=torch
# For JAX backend
export KERAS_BACKEND=jaxTo install tsgm on Apple M1 and M2 chips:
# Install with TensorFlow backend
pip install tsgm[tensorflow]
# Or install with PyTorch backend
pip install tsgm[torch]
# Or install with JAX backend (excellent performance on M1/M2)
pip install tsgm[jax]Note for PyTorch users on M1/M2 chips: Some operations may need CPU fallback on MPS devices. If you encounter MPS-related errors, set the environment variable:
export PYTORCH_ENABLE_MPS_FALLBACK=1Note for JAX users: JAX provides excellent performance on M1/M2 chips and supports GPU acceleration. For optimal performance, consider installing JAX with Metal support:
pip install -U "jax[metal]"import tsgm
# ... Define hyperparameters ...
# dataset is a tensor of shape n_samples x seq_len x feature_dim
# Zoo contains several prebuilt architectures: we choose a conditional GAN architecture
architecture = tsgm.models.architectures.zoo["cgan_base_c4_l1"](
seq_len=seq_len, feat_dim=feature_dim,
latent_dim=latent_dim, output_dim=0)
discriminator, generator = architecture.discriminator, architecture.generator
# Initialize GAN object with selected discriminator and generator
gan = tsgm.models.cgan.GAN(
discriminator=discriminator, generator=generator, latent_dim=latent_dim
)
gan.compile(
d_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
g_optimizer=keras.optimizers.Adam(learning_rate=0.0003),
loss_fn=keras.losses.BinaryCrossentropy(from_logits=True),
)
gan.fit(dataset, epochs=N_EPOCHS)
# Generate 100 synthetic samples
result = gan.generate(100)-
Introductory Tutorial Getting started with TSGM
-
Tutorial Datasets in TSGM
-
Tutorial Time Series Augmentations
-
Tutorial Time Series Generation with VAEs
-
Tutorial Model Selection
-
Tutorial Multiple GPUs or TPU with TSGM
For more examples, see our tutorials.
TSGM provides a number of time series augmentations.
| Augmentation | Class in TSGM | Reference |
|---|---|---|
| Gaussian Noise / Jittering | tsgm.augmentations.GaussianNoise |
- |
| Slice-And-Shuffle | tsgm.augmentations.SliceAndShuffle |
- |
| Shuffle Features | tsgm.augmentations.Shuffle |
- |
| Magnitude Warping | tsgm.augmentations.MagnitudeWarping |
Data Augmentation of Wearable Sensor Data for Parkinson’s Disease Monitoring using Convolutional Neural Networks |
| Window Warping | tsgm.augmentations.WindowWarping |
Data Augmentation for Time Series Classification using Convolutional Neural Networks |
| DTW Barycentric Averaging | tsgm.augmentations.DTWBarycentricAveraging |
A global averaging method for dynamic time warping, with applications to clustering. |
TSGM implements several generative models for synthetic time series data.
| Method | Link to docs | Type | Notes |
|---|---|---|---|
| Structural Time Series | sts.STS | Data-driven | Great for modeling time series when prior knowledge is available (e.g., trend or seasonality). |
| GAN | GAN | Data-driven | A generic implementation of GAN for time series generation. It can be customized with architectures for generators and discriminators. |
| WaveGAN | GAN | Data-driven | WaveGAN is the model for audio synthesis proposed in Adversarial Audio Synthesis. To use WaveGAN, set use_wgan=True when initializing the GAN class and use the zoo["wavegan"] architecture from the model zoo. |
| ConditionalGAN | ConditionalGAN | Data-driven | A generic implementation of conditional GAN. It supports scalar conditioning as well as temporal one. |
| BetaVAE | BetaVAE | Data-driven | A generic implementation of Beta VAE for TS. The loss function is customized to work well with multi-dimensional time series. |
| cBetaVAE | cBetaVAE | Data-driven | Conditional version of BetaVAE. It supports temporal a scalar condiotioning. |
| TimeGAN | TimeGAN | Data-driven | TSGM implementation of TimeGAN from paper |
| SineConstSimulator | SineConstSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
| Lotka Volterra | LotkaVolterraSimulator | Simulator-based | Simulator-based synthetic signal that switches between constant and periodics functions. |
| PdM Simulator | PdMSimulator | Simulator-based | Simulator of predictive maintenance with multiple pieces of equipment from paper |
TSGM implements many metrics for synthetic time series evaluation. Check Section 3 from our paper for more detail on the evaluation of synthetic time series.
| Metric | Link to docs | Type | Notes |
|---|---|---|---|
| Distance in the space of summary statistics | tsgm.metrics.DistanceMetric | Distance | Calculates a set of summary statistics in the original and synthetic data, and measures the distance between those. |
| Maximum Mean Discrepancy (MMD) | tsgm.metrics.MMDMetric | Distance | This metric calculated MMD between real and synthetic samples |
| Discriminative Score | tsgm.metrics.DiscriminativeMetric | Distance | The DiscriminativeMetric measures the discriminative performance of a model in distinguishing between synthetic and real datasets. |
| Demographic Parity Score | tsgm.metrics.DemographicParityMetric | Fairness | This metric assesses the difference in the distributions of a target variable among different groups in two datasets. Refer to this paper to learn more. |
| Predictive Parity Score | tsgm.metrics.PredictiveParityMetric | Fairness | This metric assesses the discrepancy in the predictive performance of a model among different groups in two datasets. Refer to this paper to learn more. |
| Privacy Membership Inference Attack Score | tsgm.metrics.PrivacyMembershipInferenceMetric | Privacy | The metric measures the possibility of membership inference attacks. |
| Spectral Entropy | tsgm.metrics.EntropyMetric | Diversity | Calculates the spectral entropy of a dataset or tensor as a sum of individual entropies. |
| Shannon Entropy | tsgm.metrics.ShannonEntropyMetric | Diversity | Shannon Entropy calculated over the labels of a dataset. |
| Pairwise Distance | tsgm.metrics.PairwiseDistanceMetric | Diversity | Measures pairwise distances in a set of time series. |
| Downstream Effectiveness | tsgm.metrics.DownstreamPerformanceMetric | Downstream Effectiveness | The downstream performance metric evaluates the performance of a model on a downstream task. It returns performance gains achieved with the addition of synthetic data. |
| Qualitative Evaluation | tsgm.utils.visualization | Qualitative | Various tools for visual assessment of a generated dataset. |
| Dataset | API | Description |
|---|---|---|
| UCR Dataset | tsgm.utils.UCRDataManager |
https://www.cs.ucr.edu/%7Eeamonn/time_series_data_2018/ |
| Mauna Loa | tsgm.utils.get_mauna_loa() |
https://gml.noaa.gov/ccgg/trends/data.html |
| EEG & Eye state | tsgm.utils.get_eeg() |
https://archive.ics.uci.edu/ml/datasets/EEG+Eye+State |
| Power consumption dataset | tsgm.utils.get_power_consumption() |
https://archive.ics.uci.edu/ml/datasets/individual+household+electric+power+consumption |
| Stock data | tsgm.utils.get_stock_data(ticker_name) |
Gets historical stock data from YFinance |
| COVID-19 over the US | tsgm.utils.get_covid_19() |
Covid-19 distribution over the US |
| Energy Data (UCI) | tsgm.utils.get_energy_data() |
https://archive.ics.uci.edu/ml/datasets/Appliances+energy+prediction |
| MNIST as time series | tsgm.utils.get_mnist_data() |
https://en.wikipedia.org/wiki/MNIST_database |
| Samples from GPs | tsgm.utils.get_gp_samples_data() |
https://en.wikipedia.org/wiki/Gaussian_process |
| Physionet 2012 | tsgm.utils.get_physionet2012() |
https://archive.physionet.org/pn3/challenge/2012/ |
| Synchronized Brainwave Dataset | tsgm.utils.get_synchronized_brainwave_dataset() |
https://www.kaggle.com/datasets/berkeley-biosense/synchronized-brainwave-dataset |
TSGM provides API for convenient use of many time-series datasets (currently more than 140 datasets). The comprehensive list of the datasets in the documentation
We appreciate all contributions. To learn more, please check CONTRIBUTING.md.
git clone github.com/AlexanderVNikitin/tsgm
cd tsgm
pip install -e .Run tests:
python -m pytestTo check static typing:
mypyWe provide two CLIs for convenient synthetic data generation:
tsgm-gdgenerates data by a stored sample,tsgm-evalevaluates the generated time series.
Use tsgm-gd --help or tsgm-eval --help for documentation.
If you find this repo useful, please consider citing our paper:
@article{
nikitin2023tsgm,
title={TSGM: A Flexible Framework for Generative Modeling of Synthetic Time Series},
author={Nikitin, Alexander and Iannucci, Letizia and Kaski, Samuel},
journal={arXiv preprint arXiv:2305.11567},
year={2023}
}

