A command-line tool for detecting backdoors in convolutional neural network (CNN) models.
Note — This is a Master 2 thesis project (Embedded Systems). It has been validated on CIFAR-10 models, but it is a study/research tool, not production software. Use it as a starting point or reference, not as a security guarantee.
ML-Backdoor-CLI scans a trained Keras model and reports whether it likely contains a backdoor — a hidden behavior injected during training that activates when a specific trigger pattern is present in the input.
Two detection methods are implemented:
For each output class, reverse-engineers the smallest trigger (mask + pattern) that forces all inputs to classify as that class. Then applies MAD (Median Absolute Deviation) outlier detection across the trigger norms: a class whose trigger is anomalously small indicates a backdoor target.
anomaly_index > 2.0= backdoor suspected- No labeled dataset required
- Based on: Wang et al., "Neural Cleanse", IEEE S&P 2019
Extracts activations from the penultimate layer, reduces dimensionality with PCA, then applies K-means (k=2) per class. Poisoned samples form a small minority cluster separable from clean samples.
- Requires a labeled dataset
- Based on: Chen et al., "Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering", AAAI SafeAI Workshop 2019
- Python 3.10+
- TensorFlow >= 2.10
- NumPy, scikit-learn, h5py
pip install -r requirements.txt# Neural Cleanse — no labeled data needed (generates random images if no --data)
python scanner.py model.h5
python scanner.py model.h5 --mode neural-cleanse --data data/
# Activation Clustering — requires labeled data
python scanner.py model.h5 --mode activation-clustering --data data/
# Auto-detect: uses activation-clustering if --data is provided, neural-cleanse otherwise
python scanner.py model.h5 --data data/Options:
--quick— fewer optimization steps (faster, less accurate)--verbose— detailed per-step logs--output json— machine-readable JSON output
scanner.py # CLI entry point
detectors/
neural_cleanse.py # Neural Cleanse implementation (TF2 / GradientTape)
activation_cluster.py # Activation Clustering implementation
utils/
model_loader.py # Keras .h5 model loading
report.py # Console and JSON formatting
tests/
test_detectors.py # Unit + integration tests
data/ # Test data (not on main — see dev/study branch)
models/ # Model files (not on main — see dev/study branch)
Binary files (.h5, .keras, .npy) are not included on main to keep the
repository lightweight. They are available on the
dev/study branch for experimentation.
Validated on CIFAR-10 models (32x32x3, 10 classes, .h5 format):
| Model | Anomaly Index | Result |
|---|---|---|
backdoor_model_best.h5 |
2.12 | Backdoor detected (class 0) |
baseline_clean_best(1).h5 |
1.35 | Clean |
# Unit tests (fast, no model files needed)
pytest tests/
# Integration tests (needs model + data files from dev/study branch)
pytest tests/ -vThis project was developed as part of a Master 2 thesis. No license specified.