XAND-ECG — Lead-wise ECG Classification

Lead-wise 1D CNN · ~3M Params · 100 Hz · 4 Diseases · XAI Validated

12-lead ECG classification with quantitative explainability validation against clinical fiducial points.
Designed for research-grade experiments under distribution shift. Not intended for clinical use.

XAND-ECG trains per-disease binary classifiers on PTB-XL using a lead-wise 1D CNN, covering 4 diagnostic families with a strict pure-label policy. Attribution maps are validated against PTB-XL+ fiducial-derived clinical masks.

📊 Results & Evaluation Protocol

MI — Myocardial Infarction | STTC — ST/T Changes | CD — Conduction Disturbance | HYP — Hypertrophy

Reported metrics:

Val AUC — Best validation AUC during training (checkpoint selection criterion)
Test† — Test AUC at the epoch where validation AUC was best → the selected model

Disease	Val AUC	Test†
MI — Myocardial Infarction	0.9723	0.9700
STTC — ST/T Changes	0.9324	0.9332
CD — Conduction Disturbance	0.9292	0.9175
HYP — Hypertrophy	0.8710	0.8846

† = test metric at the epoch of best validation AUC (selected model)

Pure labels only: conf = 100 → positive, label absent → negative, conf = 0–99 → excluded
All splits are patient-level (no patient appears in more than one split)

External Validation

The selected PTB-XL checkpoints were evaluated zero-shot on two external ECG datasets with no fine-tuning:

Chapman-Shaoxing — 45K ECGs, Shaoxing + Ningbo hospitals, China
Georgia — 10K ECGs, Emory University, Atlanta, USA

Disease	PTB-XL Test	Chapman	Georgia
MI	0.9700	0.9527	—*
STTC	0.9332	0.8669	0.8336
CD	0.9175	0.8533	0.8609
HYP	0.8846	0.7660	0.6992

* Georgia MI excluded: insufficient positive samples (n=7). SNOMED CT mapping was conservative and defined a priori.

Morphology-based conditions (MI, CD) show the strongest transfer; voltage-dependent HYP shows the largest drop — consistent with known limitations of voltage-based criteria under device and population shift. See FINDINGS.md for full degradation analysis.

🔍 Explainability Maps

Visualizations follow the standard clinical 12-lead layout (3×4 grid + rhythm strip), with attribution and clinical reference integrated directly into each lead:

Element	What it shows
CLIN strip	Blue band marking the fiducial-derived clinical region from PTB-XL+. Approximate academic ground truth for visual comparison
ATTR strip	Attribution heatmap below the trace — same color scale. Shows where the model concentrated attention along the full 10-second window
ECG trace	Raw signal in mV. Trace color reflects attribution intensity: white → yellow → orange → red

Each lead is displayed with its own vertical scale — standard practice in digital ECG viewers when amplitudes differ substantially across leads.

Same ECG, Different Questions

The same ECG produces different attribution maps depending on which pathology head is queried. Each map answers a different clinical question.

⚠️ Interpreting attribution maps

Each heatmap answers one question: where does the model look to decide about this specific condition?

The map is not a comprehensive delineation of all abnormal regions.

A binary classifier may attend to the minimum sufficient evidence for the decision, not the full pathological extent.

Myocardial Infarction

Conduction Disturbance

Hypertrophy

ST/T Changes

Note: The visual examples above are research visualizations derived from publicly available ECG records. The original datasets are not redistributed and remain subject to their respective licenses and citation requirements.

📐 Quantitative XAI Validation

Attribution maps were evaluated against PTB-XL+ fiducial-derived clinical masks on truly positive ECGs using Integrated Gradients (primary method).

Disease	n	Pointing Game	CAS
MI	245	0.9796	0.7519
STTC	367	—**	0.4034
CD	339	0.9676	0.5396
HYP	115	0.9652	0.7161

** STTC Pointing Game (0.11) is not informative for this condition: the model anchors attribution at the R-peak of V6 (~65ms before ST/T onset), reading the R→ST transition within its receptive field — a clinically coherent mechanism consistent with the LVH strain pattern. This interpretation was considered clinically plausible during clinical review.

IG and GradSHAP produce near-identical attribution maps (cosine similarity ~0.90–0.93). Cross-method analysis, surgical perturbation, and full validation details in FINDINGS.md. Clinical interpretation in Clinical Review.

🧠 Methodology & Model Design

Per-disease binary classifiers — independent model per pathology, no multilabel compromises. Each optimized for its own class balance and convergence dynamics
Lead-wise encoder with shared weights — each of the 12 leads processed independently through a shared 1D CNN encoder (4 ResBlocks, kernel=7), then concatenated (12 × 256 = 3072-dim). Per-lead attribution comes for free — no post-hoc decomposition needed. Explicit cross-lead mechanisms (attention, multiscale kernels) were tested and rejected: all degraded HYP and CD while adding parameters. The FC layers after concatenation capture inter-lead relationships implicitly
100 Hz downsampling — PTB-XL provides 500 Hz, but 100 Hz retains sufficient diagnostic information for the target conditions. In this architecture, higher resolution increased parameters ~4× without improving any pathology — the bottleneck is sample count (12K ECGs), not temporal resolution
Per-sample global z-score normalization — a single μ/σ computed over all 12×1000 samples per ECG. This was the key change that restored global voltage structure relevant for HYP diagnosis and drove HYP AUC from ~0.78 to ~0.88. A subsequent per-lead normalization is kept as a deliberate trade-off: it improves CD (+0.018 AUC, morphology-based) at a minor cost to HYP (−0.004 AUC).
Pure label training — only explicit positives (conf = 100) and negatives (label absent); uncertain samples (conf = 0–99) excluded entirely. Uncertain labels would weaken XAI validation: attribution maps must point to real pathology, not annotator disagreement. Focal Loss (γ=2.0) handles class imbalance across 8–24% positive rates
Multitask regularization — auxiliary heads for Device (0.03) and Sex (0.01). Empirically, Device acted as a regularizer: it encouraged device-invariant representations, stabilizing HYP training (worst validation oscillation drops from −1003bp to −272bp). Sex provides orthogonal physiological signal (QRS duration, voltage amplitudes differ by sex), benefiting MI and STTC

Architecture ablations, normalization experiments, and auxiliary task analysis in FINDINGS.md.

⚙️ Training, Hardware

Core Training Configuration

Component	Choice
Architecture	Lead-wise 1D CNN (shared encoder)
Parameters	~3.0M
Input	12 leads × 1000 samples (100 Hz × 10s)
Loss	Focal Loss (γ = 2.0)
Optimizer	AdamW + Cosine decay
Aux tasks	Device (0.03), Sex (0.01)
Label policy	Pure labels only (conf=100 / absent)
Dataset	PTB-XL + PTB-XL+
Tracking	MLflow

Hardware Used

GPU	Role
GTX 1650 4GB	Primary training & reported results (seed=42)
RTX 4060 Ti 16GB	Development & experimentation

Peak VRAM usage ~1.9 GB per disease.

🚀 Quick Start

git clone https://github.com/XOREngine/xand-ecg.git
cd xand-ecg
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

See QUICKSTART.md for data setup, training, XAI evaluation, and visualization commands.

📂 Documentation

Document	Description
`QUICKSTART.md`	Data setup, training, evaluation, and visualization commands
`docs/FINDINGS.md`	Full experimental record — architecture ablations, normalization, XAI validation, lead robustness, surgical perturbation, external generalization
`docs/XAND-ECG_v0.1___ClinicalReview.pdf`	Clinical review document — per-pathology assessment of learned mechanisms by cardiology

📚 References

Datasets

PTB-XL — https://physionet.org/content/ptb-xl/
PTB-XL+ (fiducial points & features) — https://physionet.org/content/ptb-xl-plus/
Chapman-Shaoxing (arrhythmia database) — https://physionet.org/content/ecg-arrhythmia/
Georgia G12EC (CinC Challenge 2020) — https://physionet.org/content/challenge-2020/

External Benchmarks & Related Work

PTB-XL: A Large Publicly Available Electrocardiography Dataset — https://doi.org/10.1038/s41597-020-0495-6
PTB-XL+: A Comprehensive Electrocardiographic Feature Dataset — https://doi.org/10.1038/s41597-023-02153-8
A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients — https://doi.org/10.1038/s41597-020-0386-x
PhysioNet/CinC Challenge 2020: Classification of 12-lead ECGs — https://doi.org/10.1088/1361-6579/abc960
PTB-XL Benchmark: Deep Learning for ECG Analysis — https://doi.org/10.1109/JBHI.2020.3022989
PhysioNet Resource — https://doi.org/10.1161/01.CIR.101.23.e215

Optimization & Methods

Integrated Gradients — https://arxiv.org/abs/1703.01365
SHAP / GradSHAP — https://arxiv.org/abs/1705.07874 · https://doi.org/10.1038/s42256-021-00343-w
Saliency Maps — https://arxiv.org/abs/1312.6034
Captum (attribution library) — https://arxiv.org/abs/2009.07896
Focal Loss — https://arxiv.org/abs/1708.02002
Decoupled Weight Decay (AdamW) — https://arxiv.org/abs/1711.05101
ResNet (residual blocks used in 1D encoder) — https://arxiv.org/abs/1512.03385

👥 Contributors

Development:

José Artusa (@WallyByte) — Project design and implementation.

Clinical Review:

Belén Biscotti (LinkedIn) — Conducted the Clinical Review and reviewed the clinical coherence of the attribution maps and findings.

📬 Contact

For questions, please reach out at info@xorengine.com

XAND-ECG — Lead-wise ECG Classification.
💪 If you extend it, audit it, or break it and improve it — go for it.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
assets		assets
configs		configs
docs		docs
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
QUICKSTART.md		QUICKSTART.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XAND-ECG — Lead-wise ECG Classification

Lead-wise 1D CNN · ~3M Params · 100 Hz · 4 Diseases · XAI Validated