12-lead ECG classification with quantitative explainability validation against clinical fiducial points.
Designed for research-grade experiments under distribution shift. Not intended for clinical use.
XAND-ECG trains per-disease binary classifiers on PTB-XL using a lead-wise 1D CNN, covering 4 diagnostic families with a strict pure-label policy. Attribution maps are validated against PTB-XL+ fiducial-derived clinical masks.
MI — Myocardial Infarction | STTC — ST/T Changes | CD — Conduction Disturbance | HYP — Hypertrophy
Reported metrics:
- Val AUC — Best validation AUC during training (checkpoint selection criterion)
- Test† — Test AUC at the epoch where validation AUC was best → the selected model
| Disease | Val AUC | Test† |
|---|---|---|
| MI — Myocardial Infarction | 0.9723 | 0.9700 |
| STTC — ST/T Changes | 0.9324 | 0.9332 |
| CD — Conduction Disturbance | 0.9292 | 0.9175 |
| HYP — Hypertrophy | 0.8710 | 0.8846 |
† = test metric at the epoch of best validation AUC (selected model)
- Pure labels only:
conf = 100→ positive, label absent → negative,conf = 0–99→ excluded - All splits are patient-level (no patient appears in more than one split)
The selected PTB-XL checkpoints were evaluated zero-shot on two external ECG datasets with no fine-tuning:
- Chapman-Shaoxing — 45K ECGs, Shaoxing + Ningbo hospitals, China
- Georgia — 10K ECGs, Emory University, Atlanta, USA
| Disease | PTB-XL Test | Chapman | Georgia |
|---|---|---|---|
| MI | 0.9700 | 0.9527 | —* |
| STTC | 0.9332 | 0.8669 | 0.8336 |
| CD | 0.9175 | 0.8533 | 0.8609 |
| HYP | 0.8846 | 0.7660 | 0.6992 |
* Georgia MI excluded: insufficient positive samples (n=7). SNOMED CT mapping was conservative and defined a priori.
Morphology-based conditions (MI, CD) show the strongest transfer; voltage-dependent HYP shows the largest drop — consistent with known limitations of voltage-based criteria under device and population shift. See FINDINGS.md for full degradation analysis.
Visualizations follow the standard clinical 12-lead layout (3×4 grid + rhythm strip), with attribution and clinical reference integrated directly into each lead:
| Element | What it shows |
|---|---|
| CLIN strip | Blue band marking the fiducial-derived clinical region from PTB-XL+. Approximate academic ground truth for visual comparison |
| ATTR strip | Attribution heatmap below the trace — same color scale. Shows where the model concentrated attention along the full 10-second window |
| ECG trace | Raw signal in mV. Trace color reflects attribution intensity: white → yellow → orange → red |
Each lead is displayed with its own vertical scale — standard practice in digital ECG viewers when amplitudes differ substantially across leads.
The same ECG produces different attribution maps depending on which pathology head is queried. Each map answers a different clinical question.
⚠️ Interpreting attribution maps
- Each heatmap answers one question: where does the model look to decide about this specific condition?
- The map is not a comprehensive delineation of all abnormal regions.
- A binary classifier may attend to the minimum sufficient evidence for the decision, not the full pathological extent.
Note: The visual examples above are research visualizations derived from publicly available ECG records. The original datasets are not redistributed and remain subject to their respective licenses and citation requirements.
Attribution maps were evaluated against PTB-XL+ fiducial-derived clinical masks on truly positive ECGs using Integrated Gradients (primary method).
| Disease | n | Pointing Game | CAS |
|---|---|---|---|
| MI | 245 | 0.9796 | 0.7519 |
| STTC | 367 | —** | 0.4034 |
| CD | 339 | 0.9676 | 0.5396 |
| HYP | 115 | 0.9652 | 0.7161 |
** STTC Pointing Game (0.11) is not informative for this condition: the model anchors attribution at the R-peak of V6 (~65ms before ST/T onset), reading the R→ST transition within its receptive field — a clinically coherent mechanism consistent with the LVH strain pattern. This interpretation was considered clinically plausible during clinical review.
IG and GradSHAP produce near-identical attribution maps (cosine similarity ~0.90–0.93). Cross-method analysis, surgical perturbation, and full validation details in FINDINGS.md. Clinical interpretation in Clinical Review.
- Per-disease binary classifiers — independent model per pathology, no multilabel compromises. Each optimized for its own class balance and convergence dynamics
- Lead-wise encoder with shared weights — each of the 12 leads processed independently through a shared 1D CNN encoder (4 ResBlocks, kernel=7), then concatenated (12 × 256 = 3072-dim). Per-lead attribution comes for free — no post-hoc decomposition needed. Explicit cross-lead mechanisms (attention, multiscale kernels) were tested and rejected: all degraded HYP and CD while adding parameters. The FC layers after concatenation capture inter-lead relationships implicitly
- 100 Hz downsampling — PTB-XL provides 500 Hz, but 100 Hz retains sufficient diagnostic information for the target conditions. In this architecture, higher resolution increased parameters ~4× without improving any pathology — the bottleneck is sample count (12K ECGs), not temporal resolution
- Per-sample global z-score normalization — a single μ/σ computed over all 12×1000 samples per ECG. This was the key change that restored global voltage structure relevant for HYP diagnosis and drove HYP AUC from ~0.78 to ~0.88. A subsequent per-lead normalization is kept as a deliberate trade-off: it improves CD (+0.018 AUC, morphology-based) at a minor cost to HYP (−0.004 AUC).
- Pure label training — only explicit positives (
conf = 100) and negatives (label absent); uncertain samples (conf = 0–99) excluded entirely. Uncertain labels would weaken XAI validation: attribution maps must point to real pathology, not annotator disagreement. Focal Loss (γ=2.0) handles class imbalance across 8–24% positive rates - Multitask regularization — auxiliary heads for Device (0.03) and Sex (0.01). Empirically, Device acted as a regularizer: it encouraged device-invariant representations, stabilizing HYP training (worst validation oscillation drops from −1003bp to −272bp). Sex provides orthogonal physiological signal (QRS duration, voltage amplitudes differ by sex), benefiting MI and STTC
Architecture ablations, normalization experiments, and auxiliary task analysis in FINDINGS.md.
| Component | Choice |
|---|---|
| Architecture | Lead-wise 1D CNN (shared encoder) |
| Parameters | ~3.0M |
| Input | 12 leads × 1000 samples (100 Hz × 10s) |
| Loss | Focal Loss (γ = 2.0) |
| Optimizer | AdamW + Cosine decay |
| Aux tasks | Device (0.03), Sex (0.01) |
| Label policy | Pure labels only (conf=100 / absent) |
| Dataset | PTB-XL + PTB-XL+ |
| Tracking | MLflow |
| GPU | Role |
|---|---|
| GTX 1650 4GB | Primary training & reported results (seed=42) |
| RTX 4060 Ti 16GB | Development & experimentation |
Peak VRAM usage ~1.9 GB per disease.
git clone https://github.com/XOREngine/xand-ecg.git
cd xand-ecg
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtSee QUICKSTART.md for data setup, training, XAI evaluation, and visualization commands.
| Document | Description |
|---|---|
QUICKSTART.md |
Data setup, training, evaluation, and visualization commands |
docs/FINDINGS.md |
Full experimental record — architecture ablations, normalization, XAI validation, lead robustness, surgical perturbation, external generalization |
docs/XAND-ECG_v0.1___ClinicalReview.pdf |
Clinical review document — per-pathology assessment of learned mechanisms by cardiology |
- PTB-XL — https://physionet.org/content/ptb-xl/
- PTB-XL+ (fiducial points & features) — https://physionet.org/content/ptb-xl-plus/
- Chapman-Shaoxing (arrhythmia database) — https://physionet.org/content/ecg-arrhythmia/
- Georgia G12EC (CinC Challenge 2020) — https://physionet.org/content/challenge-2020/
- PTB-XL: A Large Publicly Available Electrocardiography Dataset — https://doi.org/10.1038/s41597-020-0495-6
- PTB-XL+: A Comprehensive Electrocardiographic Feature Dataset — https://doi.org/10.1038/s41597-023-02153-8
- A 12-lead electrocardiogram database for arrhythmia research covering more than 10,000 patients — https://doi.org/10.1038/s41597-020-0386-x
- PhysioNet/CinC Challenge 2020: Classification of 12-lead ECGs — https://doi.org/10.1088/1361-6579/abc960
- PTB-XL Benchmark: Deep Learning for ECG Analysis — https://doi.org/10.1109/JBHI.2020.3022989
- PhysioNet Resource — https://doi.org/10.1161/01.CIR.101.23.e215
- Integrated Gradients — https://arxiv.org/abs/1703.01365
- SHAP / GradSHAP — https://arxiv.org/abs/1705.07874 · https://doi.org/10.1038/s42256-021-00343-w
- Saliency Maps — https://arxiv.org/abs/1312.6034
- Captum (attribution library) — https://arxiv.org/abs/2009.07896
- Focal Loss — https://arxiv.org/abs/1708.02002
- Decoupled Weight Decay (AdamW) — https://arxiv.org/abs/1711.05101
- ResNet (residual blocks used in 1D encoder) — https://arxiv.org/abs/1512.03385
Development:
- José Artusa (@WallyByte) — Project design and implementation.
Clinical Review:
- Belén Biscotti (LinkedIn) — Conducted the Clinical Review and reviewed the clinical coherence of the attribution maps and findings.
For questions, please reach out at info@xorengine.com
XAND-ECG — Lead-wise ECG Classification.
💪 If you extend it, audit it, or break it and improve it — go for it.
© 2026 XOREngine · Open Source Commitment



