An end-to-end transformer pipeline for imagined speech electroencephalogram classification.
SincTran combines learnable sinc bandpass filter banks (one per physiological EEG band) with a depthwise-separable CNN spatial/temporal encoder and a Transformer with a CLS token. Band contributions are weighted by a learned attention mechanism before spatial mixing, and the Transformer output at the CLS position is passed to an MLP classification head.
SINCTRAN/
├── Code/
│ ├── core_model.py # SincTran architecture + SincBandFilter + TransparentEncoderLayer
│ ├── core_dataset.py # EEG preprocessing, ZCA whitening, stratified k-fold data loaders
│ ├── core_loaders.py # Dataset loaders for ASU and BCI Competition datasets
│ ├── core_loss.py # Cross-entropy loss wrapper
│ ├── core_train.py # Training loop, evaluation, metric logging, confusion matrices
│ ├── core_utils.py # Reproducibility, timing, FLOPs, metadata serialisation
│ │
│ ├── main_asu.py # SincTran training pipeline — ASU dataset
│ ├── main_bci.py # SincTran training pipeline — BCI Competition Track 3
│ ├── main_driver.py # Entry point for main experiment runs
│ │
│ ├── sota_asu.py # SOTA baseline training pipeline — ASU dataset
│ ├── sota_bci.py # SOTA baseline training pipeline — BCI Competition Track 3
│ ├── sota_driver.py # Entry point for SOTA benchmark runs
│ │
│ ├── int_module.py # Interpretation utilities: band attention, attention rollout, GradCAM, UMAP
│ ├── int_asu.py # Interpretation pipeline — ASU dataset
│ ├── int_bci.py # Interpretation pipeline — BCI Competition Track 3
│ └── int_driver.py # Entry point for interpretation runs
│
├── CITATION.cff
├── requirements.txt
└── README.mdAll scripts live under Code/ and are intentionally flat within it — all imports resolve within the same directory with no relative-path dependencies. Run all driver scripts from inside Code/.
Two imagined-speech EEG benchmarks are supported.
ASU Speech Imagery Dataset
Four tasks (n1–n4) covering long words, short words, vowels, and mixed word pairs. Raw .mat files are expected under a single root directory; core_loaders.py resolves subfolder layout automatically. ASU Dataset
BCI Competition 2020 Track 3
Five-class imagined-speech dataset (15 subjects). Expects the standard competition folder layout with a Track3_clean.csv label file for the held-out test partition. BCI Dataset
Python 3.10 or later is recommended.
pip install -r requirements.txtAll imports use only the standard library plus the packages listed in requirements.txt. No package installation beyond this step is required.
Each experiment group has a dedicated driver script. Configure the run by editing the RUN_CFG dict at the bottom of the relevant driver, then execute it from inside the Code/ directory.
cd Code
python main_driver.pySet DATA_PATH and RESULTS_ROOT at the top of main_asu.py or main_bci.py before running.
python sota_driver.pySupported baselines: EEGNet, MSVTNet, ShallowFBCSPNet, CTNet, EEGConformer, ATCNet, EEGNeX (all via braindecode).
python int_driver.pyRequires trained weights saved by a main-experiment run. Set WEIGHTS_ROOT and RESULTS_ROOT in int_asu.py / int_bci.py.
Interpretation outputs per subject/fold:
| Output file | Description |
|---|---|
band_attn_norm_*.svg |
Band-attention weights (L1-normalised), per class |
band_attn_lin_*.svg |
Raw linear band-attention scores, per class |
cls_umap_*.svg |
UMAP of CLS-token embeddings |
mlp_umap_*.svg |
UMAP of MLP first-layer activations |
attn_rollout_*.svg |
CLS-token attention rollout heatmap (Abnar & Zuidema 2020) |
gradcam_*.svg |
Spatial Grad-CAM heatmap |
Fold-average versions of rollout and GradCAM figures are written automatically after the last fold.
Each pipeline script (main_asu.py, main_bci.py, sota_asu.py, etc.) contains three self-contained config dicts — DATA_CFG, MODEL_CFG, and TRAIN_CFG — alongside a DATA_PATH / RESULTS_ROOT header block. No external config files are required; all hyperparameters are co-located with the pipeline that uses them.
Driver scripts expose a RUN_CFG dict for selecting dataset, task, subjects, fold count, and epoch budget without touching the pipeline files.
core_utils.set_all_seeds fixes seeds for Python, NumPy, and PyTorch (including cudnn.deterministic). All pipelines call this before data loading. The default seed across all experiments is 37.
Normalisation and ZCA whitening statistics are always fitted on the training split only and applied to validation and test splits, with statistics recomputed per fold.
If you use SincTran in your research, please cite:
@misc{sinctran2026,
author = {Arun Balasubramanian and Santhoshkumar Peddi and Debasis Samanta},
title = {SincTran: An End-to-End Transformer Pipeline for Imagined Speech EEG Classification},
year = {2026},
doi = {10.2139/ssrn.6641648},
url = {https://github.com/ArunBsmn/SincTran},
note = {GitHub repository}
}