Skip to content

dxtrnear/ML-Backdoor-CLI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ML-Backdoor-CLI

A command-line tool for detecting backdoors in convolutional neural network (CNN) models.

Note — This is a Master 2 thesis project (Embedded Systems). It has been validated on CIFAR-10 models, but it is a study/research tool, not production software. Use it as a starting point or reference, not as a security guarantee.

What It Does

ML-Backdoor-CLI scans a trained Keras model and reports whether it likely contains a backdoor — a hidden behavior injected during training that activates when a specific trigger pattern is present in the input.

Two detection methods are implemented:

Neural Cleanse

For each output class, reverse-engineers the smallest trigger (mask + pattern) that forces all inputs to classify as that class. Then applies MAD (Median Absolute Deviation) outlier detection across the trigger norms: a class whose trigger is anomalously small indicates a backdoor target.

  • anomaly_index > 2.0 = backdoor suspected
  • No labeled dataset required
  • Based on: Wang et al., "Neural Cleanse", IEEE S&P 2019

Activation Clustering

Extracts activations from the penultimate layer, reduces dimensionality with PCA, then applies K-means (k=2) per class. Poisoned samples form a small minority cluster separable from clean samples.

  • Requires a labeled dataset
  • Based on: Chen et al., "Detecting Backdoor Attacks on Deep Neural Networks by Activation Clustering", AAAI SafeAI Workshop 2019

Requirements

  • Python 3.10+
  • TensorFlow >= 2.10
  • NumPy, scikit-learn, h5py
pip install -r requirements.txt

Usage

# Neural Cleanse — no labeled data needed (generates random images if no --data)
python scanner.py model.h5
python scanner.py model.h5 --mode neural-cleanse --data data/

# Activation Clustering — requires labeled data
python scanner.py model.h5 --mode activation-clustering --data data/

# Auto-detect: uses activation-clustering if --data is provided, neural-cleanse otherwise
python scanner.py model.h5 --data data/

Options:

  • --quick — fewer optimization steps (faster, less accurate)
  • --verbose — detailed per-step logs
  • --output json — machine-readable JSON output

Project Structure

scanner.py                  # CLI entry point
detectors/
    neural_cleanse.py       # Neural Cleanse implementation (TF2 / GradientTape)
    activation_cluster.py   # Activation Clustering implementation
utils/
    model_loader.py         # Keras .h5 model loading
    report.py               # Console and JSON formatting
tests/
    test_detectors.py       # Unit + integration tests
data/                       # Test data (not on main — see dev/study branch)
models/                     # Model files (not on main — see dev/study branch)

Data and Models

Binary files (.h5, .keras, .npy) are not included on main to keep the repository lightweight. They are available on the dev/study branch for experimentation.

Validated on CIFAR-10 models (32x32x3, 10 classes, .h5 format):

Model Anomaly Index Result
backdoor_model_best.h5 2.12 Backdoor detected (class 0)
baseline_clean_best(1).h5 1.35 Clean

Tests

# Unit tests (fast, no model files needed)
pytest tests/

# Integration tests (needs model + data files from dev/study branch)
pytest tests/ -v

License

This project was developed as part of a Master 2 thesis. No license specified.

About

A CLI to detect poisoned models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages