A comprehensive toolkit for MALDI-TOF mass spectrometry data preprocessing for antimicrobial resistance (AMR) prediction purposes
Installation • Features • Quick Start • Documentation • Tutorials • Contributing • Citing • License
pip install maldiamrkitpip install maldiamrkit[batch]Installs combatlearn for ComBat-based batch effect correction and umap-learn for UMAP exploratory plots.
git clone https://github.com/EttoreRocchi/MaldiAMRKit.git
cd MaldiAMRKit
pip install -e .[dev]- Composable Pipeline: Build custom
PreprocessingPipelinefrom individual transformers (smoothing, baseline correction, normalization, trimming), serializable to JSON/YAML - Multiple Binning Strategies: Uniform, proportional, adaptive, and custom bin edges
- Quality Metrics: SNR estimation, comprehensive quality reports, and alignment assessment
- Replicate Merging: Mean/median/weighted merging with correlation-based outlier detection
- Spectral Alignment: Shift, linear, piecewise, and DTW warping for both binned and raw full-resolution spectra
- Peak Detection: Local maxima and persistent homology methods
- AMR Metrics: VME, ME, sensitivity, specificity, categorical agreement, and
amr_classification_reportfollowing EUCAST/CLSI conventions - Label Encoding:
LabelEncoderfor mapping R/I/S to binary with configurable intermediate handling - Stratified Splitting: Species-drug stratified and case-based (patient-grouped) splitting to prevent data leakage
DifferentialAnalysis: Per-bin statistical testing (Mann-Whitney U, Welch's t-test) between resistant and susceptible groups, with multiple-testing correction, log2 fold change, and Cohen's d effect size- Peak Selection:
top_peaks()by adjusted p-value,significant_peaks()with fold-change and p-value thresholds,compare_drugs()for multi-drug boolean significance matrices - AMR-Aware Plots:
plot_volcano(),plot_manhattan()along the m/z axis, andplot_drug_comparison()with binary heatmap or UpSet-style intersection view
DriftMonitor: Anchor a baseline on early timestamps (default: first 20%) and track temporal drift via three complementary views - reference similarity of per-window median spectra, PCA centroid trajectory in a baseline-fitted PCA space, and Jaccard stability of top-k differential peaks over time- Trajectory Plots:
plot_reference_drift,plot_pca_drift,plot_peak_stability,plot_effect_size_drift
- Dataset Building & Loading:
DatasetBuilderandDatasetLoaderwith pluggable layout adapters (FlatLayout,BrukerTreeLayout,DRIAMSLayout,MARISMaLayout) - Bruker Format Support: Read Bruker flexAnalysis binary data (fid/1r + acqus) natively via
read_spectrum()on directories - MIC Parsing:
parse_mic_column()for parsing MIC strings with qualifiers and European decimals - Composable Filters:
SpeciesFilter,DrugFilter,QualityFilter,MetadataFiltercombinable with&/|/~operators - Spectrum Export: Save spectra to CSV or TXT via
MaldiSpectrum.save()andMaldiSet.save_spectra()
- Exploratory Plots: PCA, t-SNE, and UMAP scatter plots colored by species, resistance phenotype, or any metadata column
- Batch Effect Correction: Multi-site/multi-instrument correction via
combatlearn(pip install maldiamrkit[batch]) - CLI:
maldiamrkit preprocess,maldiamrkit quality, andmaldiamrkit buildfor batch processing - Parallel Processing: Multi-core support via
n_jobsparameter - ML-Ready: Direct integration with scikit-learn pipelines
Full documentation is available at maldiamrkit.readthedocs.io.
from maldiamrkit import MaldiSpectrum
# Load spectrum from file
spec = MaldiSpectrum("data/spectrum.txt")
# Preprocess: smoothing, baseline removal, normalization
spec.preprocess()
# Optional: bin to reduce dimensions
spec.bin(bin_width=3) # 3 Da bins
# Visualize
from maldiamrkit.visualization import plot_spectrum
plot_spectrum(spec, binned=True)from maldiamrkit import MaldiSet
# Load multiple spectra with metadata
data = MaldiSet.from_directory(
spectra_dir="data/spectra/",
meta_file="data/metadata.csv",
aggregate_by=dict(antibiotics="Drug", species="Escherichia coli"),
bin_width=3
)
# Access features and labels
X = data.X # Feature matrix
y = data.get_y_single("Drug") # Target labelsfrom sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import cross_val_score
from maldiamrkit.alignment import Warping
from maldiamrkit.detection import MaldiPeakDetector
# Create ML pipeline
pipe = Pipeline([
("peaks", MaldiPeakDetector(binary=False, prominence=0.05)),
("warp", Warping(method="shift")),
("scaler", StandardScaler()),
("clf", RandomForestClassifier(n_estimators=100, random_state=42))
])
# Cross-validation
scores = cross_val_score(pipe, X, y, cv=5, scoring="accuracy")
print(f"CV Accuracy: {scores.mean():.3f} +/- {scores.std():.3f}")For more examples covering alignment, filtering, evaluation, CLI usage, and more, see the Quickstart Guide and API Reference.
For more detailed examples, see the notebooks:
- Quick Start - Loading, preprocessing, binning, and quality assessment
- Peak Detection - Local maxima and persistent homology methods
- Alignment - Warping methods and alignment quality
- Evaluation - AMR metrics, label encoding, and stratified splitting
- Exploration - PCA, t-SNE, UMAP visualizations and batch correction
- Differential Analysis - R vs. S peak testing, volcano/Manhattan plots, and multi-drug comparison
- Drift Monitoring - Baseline-anchored drift detection: reference similarity, PCA trajectory, peak stability, and effect-size drift
Pull requests, bug reports, and feature ideas are welcome. See the Contributing Guide for how to get started.
If you use MaldiAMRKit in your research, please cite:
Rocchi, E., Nicitra, E., Calvo, M. et al. Combining mass spectrometry and machine learning models for predicting Klebsiella pneumoniae antimicrobial resistance: a multicenter experience from clinical isolates in Italy. BMC Microbiol (2026). doi:10.1186/s12866-025-04657-2
See the full publications list for more papers using MaldiAMRKit.
This project is licensed under the MIT License. See the LICENSE file for details.
This toolkit is inspired by:
Weis, C., Cuénod, A., Rieck, B., et al. (2022). Direct antimicrobial resistance prediction from clinical MALDI-TOF mass spectra using machine learning. Nature Medicine, 28, 164-174. https://doi.org/10.1038/s41591-021-01619-9
