Quantum Kernel Feature Mapping for ICS Anomaly Detection

A hardware-agnostic Quantum Support Vector Machine (QSVM) framework for intrusion detection in Industrial Control Systems. This implementation uses an 8-qubit ZZFeatureMap kernel and has been validated on both the SWaT (Secure Water Treatment) and HAI (Hardware-in-the-Loop Augmented ICS) benchmark datasets.

Overview

Modern Industrial Control Systems face sophisticated cyber-physical attacks that exploit nonlinear correlations between process variables. Traditional linear classifiers often fail to capture these complex relationships. This project implements a quantum machine learning approach that embeds sensor data into a high-dimensional Hilbert space where attack patterns become linearly separable.

The core idea is straightforward: SCADA sensors in critical infrastructure exhibit correlated behaviors during attacks that classical kernels struggle to capture. By encoding these sensor readings into quantum states and measuring their fidelity, we can expose hidden patterns that make attacks detectable.

Key characteristics:

8-qubit ZZFeatureMap with linear entanglement (2 repetitions)
Cross-testbed validation on water treatment and thermal power systems
Hardware feasibility confirmed on IBM's 156-qubit ibm_fez processor
Statistically robust results via 5-seed cross-validation

Results Summary

Performance metrics (Mean and Std over 5 random seeds):

Dataset	F1-Score	AUC-ROC	Accuracy
SWaT	0.9002 (+/- 0.021)	0.9912 (+/- 0.004)	0.9744 (+/- 0.006)
HAI 22.04	0.3536 (+/- 0.052)	0.8309 (+/- 0.050)	0.9402 (+/- 0.006)

The quantum kernel demonstrates +10.8% AUC improvement over classical RBF-SVM on the challenging HAI dataset.

Hardware Validation

Circuits were successfully executed on IBM Quantum hardware:

Metric	SWaT	HAI
Backend	ibm_fez (156 qubits)	ibm_fez (156 qubits)
Physical Depth	74	76
CZ Gate Count	28	28
Job ID	d5l9htjh36vs73bgsi3g	d5l9huk8d8hc73cfb0pg

Installation

Clone the repository and install dependencies:

git clone https://github.com/Ali-Badami/Quantum-IDS.git
cd Quantum-IDS
pip install -r requirements.txt

For GPU-accelerated simulation (recommended for large datasets):

pip install qiskit-aer-gpu

Dependencies

Python 3.9 or higher
Qiskit 1.0.0 or higher
qiskit-machine-learning 0.7.0 or higher
qiskit-aer (with optional GPU support)
scikit-learn 1.0 or higher
pandas, numpy, matplotlib, seaborn

Dataset Setup

This project uses two publicly available ICS security datasets. You can either download the raw data and process it from scratch, or use the pre-processed files if you just want to run the quantum experiments.

Option A: Start from Raw Data

SWaT Dataset

Request access from iTrust, Singapore University of Technology and Design: https://itrust.sutd.edu.sg/itrust-labs_datasets/

Place the files in data/swat/:

data/swat/
├── normal.csv
└── attack.csv

HAI Dataset

Download HAI 22.04 from ETRI: https://github.com/icsdataset/hai

Place the files in data/hai/:

data/hai/
├── train1.csv
├── train2.csv
├── ...
├── test1.csv
├── test2.csv
└── ...

Option B: Use Pre-processed Data

If you just want to run the quantum experiments without the data ingestion steps, place the processed numpy arrays in data/processed/. The required files are:

data/processed/
├── X_train_reduced.npy
├── y_train_reduced.npy
├── X_test_reduced.npy
├── y_test_reduced.npy
└── selected_feature_names.joblib

For HAI dataset, the files go in data/processed/HAI/.

Pipeline Execution

The experiment consists of several stages that should be run in sequence. If you have pre-processed data, you can skip to Stage 3.

Stage 1: Data Ingestion

Process the raw datasets with proper time-series handling:

# SWaT dataset
python src/swat_data_ingestion.py

# HAI dataset
python src/hai_data_ingestion.py

This stage handles:

Removing the first 6 hours of SWaT data (warm-up period per standard protocol)
Missing value imputation via forward-fill (sensor signal persistence assumption)
Fitting the scaler on training data only to prevent data leakage
Preserving time-series order (no shuffling)

Stage 2: Feature Selection

Select the top 8 most discriminative features for the quantum circuit:

# SWaT
python src/swat_feature_selection.py

# HAI
python src/hai_feature_selection.py

Uses RandomForest Gini importance computed on the test set (which contains both classes). The training set is 100% normal data for anomaly detection purposes, so we need the test set for supervised feature selection.

Selected Features (SWaT):

Qubit	Sensor	Description	Importance
Q0	AIT501	Chemical analyzer (Stage 5)	18.7%
Q1	AIT201	Chemical analyzer (Stage 2)	13.3%
Q2	AIT202	Chemical analyzer (Stage 2)	8.6%
Q3	AIT504	Chemical analyzer (Stage 5)	7.8%
Q4	PIT502	Pressure indicator (Stage 5)	7.6%
Q5	PIT503	Pressure indicator (Stage 5)	5.4%
Q6	MV101	Motorized valve (Stage 1)	5.3%
Q7	FIT301	Flow transmitter (Stage 3)	5.2%

Stage 3: Quantum Kernel Computation

Compute the quantum kernel matrices using statevector simulation:

# SWaT
python src/quantum_kernel_computation.py

# HAI
python src/hai_quantum_kernel.py

This creates two kernel matrices:

Training Gram matrix (2500 x 2500)
Test kernel matrix (1000 x 2500)

The computation uses GPU acceleration if available and includes checkpointing for long-running jobs.

Stage 4: Benchmark Evaluation

Run the QSVM vs Classical SVM comparison with statistical robustness:

python src/statistical_robustness_suite.py

This generates:

Performance metrics for each of 5 random seeds
Mean and standard deviation statistics
ROC curves and confusion matrices

Stage 5: Hardware Validation (Optional)

Validate circuit feasibility on real IBM Quantum hardware:

python src/real_hardware_validation.py

This requires an IBM Quantum account and API token. Get yours at https://quantum.ibm.com

Note that you need to set your IBM Quantum API token as an environment variable or in the script.

Project Structure

Quantum-IDS/
├── src/
│   ├── swat_data_ingestion.py      # SWaT preprocessing
│   ├── swat_feature_selection.py   # Feature ranking for SWaT
│   ├── hai_data_ingestion.py       # HAI preprocessing
│   ├── hai_feature_selection.py    # Feature ranking for HAI
│   ├── quantum_kernel_computation.py   # Kernel matrix computation
│   ├── hai_quantum_kernel.py       # HAI-specific kernel computation
│   ├── hai_benchmark.py            # HAI evaluation pipeline
│   ├── statistical_robustness_suite.py # Multi-seed validation
│   └── real_hardware_validation.py # IBM hardware submission
├── data/
│   ├── swat/                       # SWaT dataset (not included)
│   ├── hai/                        # HAI dataset (not included)
│   └── processed/                  # Processed numpy arrays
├── results/
│   ├── plots/                      # Generated figures
│   └── logs/                       # JSON result logs
├── requirements.txt
├── setup.py
├── LICENSE
└── README.md

Methodology Notes

Why Quantum Kernels?

The ZZFeatureMap implements entangling gates that introduce phase factors proportional to feature products (x_i * x_j). This structure naturally captures pairwise correlations between sensors, which is relevant for ICS where attacks often manipulate related process variables simultaneously.

Classical RBF kernels compute similarity as exp(-gamma * ||x-y||^2), which treats features independently. The quantum kernel, through its entanglement structure, can identify correlated deviations that indicate coordinated sensor manipulation.

Data Leakage Prevention

Several measures ensure rigorous evaluation:

Scaler fitted on training data only
Feature selection uses test set (training set has no attacks)
Stratified sampling preserves class ratios
De-duplication ensures disjoint train/test sets for quantum experiments

Simulation vs Hardware

All performance benchmarks use noise-free statevector simulation to establish theoretical upper bounds. Hardware execution validates physical realizability but introduces approximately 17-20% fidelity degradation due to gate errors. Error mitigation techniques can partially recover this gap but are not included in this release.

Quantum Subset Sizes

Due to the O(n^2) complexity of kernel matrix computation, we use stratified subsets:

2500 samples for training
1000 samples for testing

This is sufficient for demonstrating quantum advantage while keeping computation tractable. The full classical baseline uses all available data.

Troubleshooting

GPU not detected: Make sure you have CUDA installed and then install the GPU version:

pip install qiskit-aer-gpu

Out of memory during kernel computation: Reduce the batch size in the Config class or use a smaller subset size.

Import errors with qiskit-algorithms: This codebase uses the modern Qiskit 1.x API and does not depend on the deprecated qiskit-algorithms package.

Citation

If you use this code in your research, please cite:

@article{badami2026quantum,
  title={Hardware-Agnostic Quantum Kernel Feature Mapping for Anomaly Detection 
         in Critical Infrastructure: A Cross-Testbed Validation on NISQ Processors},
  author={Badami, Shujaatali},
 year={2026}
}

License

This project is licensed under the MIT License. See LICENSE for details.

Acknowledgments

iTrust Centre at SUTD for SWaT dataset access
ETRI for the HAI dataset
IBM Quantum Network for hardware resources

Contact

Shujaatali Badami
Email: shujaatali@ieee.org
ORCID: 0009-0003-5262-021X

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Quantum Kernel Feature Mapping for ICS Anomaly Detection

Overview

Results Summary

Hardware Validation

Installation

Dependencies

Dataset Setup

Option A: Start from Raw Data

SWaT Dataset

HAI Dataset

Option B: Use Pre-processed Data

Pipeline Execution

Stage 1: Data Ingestion

Stage 2: Feature Selection

Stage 3: Quantum Kernel Computation

Stage 4: Benchmark Evaluation

Stage 5: Hardware Validation (Optional)

Project Structure

Methodology Notes

Why Quantum Kernels?

Data Leakage Prevention

Simulation vs Hardware

Quantum Subset Sizes

Troubleshooting

Citation

License

Acknowledgments

Contact

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
results		results
src		src
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

License

Ali-Badami/Quantum-IDS

Folders and files

Latest commit

History

Repository files navigation

Quantum Kernel Feature Mapping for ICS Anomaly Detection

Overview

Results Summary

Hardware Validation

Installation

Dependencies

Dataset Setup

Option A: Start from Raw Data

SWaT Dataset

HAI Dataset

Option B: Use Pre-processed Data

Pipeline Execution

Stage 1: Data Ingestion

Stage 2: Feature Selection

Stage 3: Quantum Kernel Computation

Stage 4: Benchmark Evaluation

Stage 5: Hardware Validation (Optional)

Project Structure

Methodology Notes

Why Quantum Kernels?

Data Leakage Prevention

Simulation vs Hardware

Quantum Subset Sizes

Troubleshooting

Citation

License

Acknowledgments

Contact

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages