QAMP Project: Local and Multi-Scale Strategies to Mitigate Exponential Concentration in Quantum Kernels
Goal: Build and evaluate quantum kernels for SVMs and test whether local (patch-wise) and multi-scale kernels mitigate exponential concentration as qubit count (dimension d) and/or depth increase.
Status: Active research codebase with reproducible benchmarks, summaries, and plots (see checkpoint docs).
Project definition (QAMP 2025): qiskit-advocate/qamp-2025#6
Project summary: For a summary of this project at the end of the QAMP 2025, please read this.
Project report: For a complete technical description, consult the project report and technical overview.
Quantum fidelity kernels tend to concentrate as d grows, pushing off-diagonal kernel entries toward 0 (kernel -> identity), which can reduce separability. This repo implements and benchmarks three kernel families under a unified API:
- Baseline (global fidelity): overlap of full quantum states.
- Local (patch-wise): compare subsystems via subcircuits or reduced density matrices (RDMs), then aggregate.
- Multi-scale: non-negative mix of kernels computed at multiple granularities (local + global).
We evaluate kernel geometry (off-diagonal statistics, effective rank, alignment) and downstream SVM accuracy, with sweep studies over d = 4..20 on multiple datasets. See docs/checkpoints/ for detailed results and plots.
project/
qkernels/ # kernel implementations + feature maps
analysis/ # diagnostics, plotting, summaries
scripts/ # benchmark runners + demos
configs/ # TOML experiment configs
datasets/ # local CSV datasets (large)
outputs/ # kernels, metadata, summaries
figs/ # generated figures
docs/ # checkpoint notes and recipes
tests/ # pytest unit tests
PLAN.md # project plan
README.md # this file
Python: 3.12 recommended (tested).
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -U pip
pip install -r requirements.txtThis repo tracks large kernel matrices and outputs with Git LFS.
If you need the full artifacts, install LFS and pull:
git lfs install
git lfs pullClone without downloading large .npy files (LFS pointers only):
GIT_LFS_SKIP_SMUDGE=1 git clone <repo_url>
git lfs installYou can later fetch artifacts selectively:
git lfs fetch --include "outputs/**,figs/**"
git lfs checkout --include "outputs/**,figs/**"All experiments are controlled by TOML configs in configs/. These define:
- dataset + preprocessing options,
- seed grid + train/val/test splits,
- feature-map depth + entanglement,
- kernel variants (baseline/local/multiscale),
- optional Nyström settings.
Example (abbreviated):
[run]
dataset = "breast_cancer"
seed_grid = [42, 43]
n_features = 8
pca = true
val_size = 0.2
test_size = 0.2
[feature_map]
name = "zz_qiskit"
depth_grid = [1]
entanglement = "linear"
backend = "statevector"
[post]
normalize = true
center_grid = [false]
report_rank = true
[svm]
C_grid = [0.1, 1.0, 10.0]
[[kernels]]
name = "baseline"
enabled = true
[[kernels]]
name = "local"
enabled = true
partitions = [[0,1],[2,3]]
method = "rdm"
agg = "mean"
[[kernels]]
name = "multiscale"
enabled = true
scales = [ [[0,1],[2,3]], [[0,1,2,3]] ]
weights_grid = [[0.5, 0.5]]
[nystrom]
enabled = falseNotes:
run_experiment.pyresolves config paths relative to repo root.seed_grid,depth_grid,center_grid, andweights_griddefine sweeps.- Local datasets (CSV) live in
datasets/.
Each kernel module exposes:
def build_kernel(X, feature_map="zz", depth=1, backend="statevector", seed=42, **kwargs):
"""
Returns:
K : np.ndarray (n, n) # symmetric, ~PSD, float64
meta : dict # full config (for logging)
"""Kernel-specific kwargs:
- Baseline:
entanglement - Local:
partitions,method("subcircuits" | "rdm"),agg,weights,rdm_metric - Multi-scale:
scales,weights,normalize
This is the recommended flow to generate all tables and plots:
More runnable examples are collected in docs/RECIPES.md.
- Run benchmarks (build kernels + diagnostics + SVM metrics)
python -m scripts.run_experiment --config configs/breast_cancer_d8.toml- Summarize outputs into CSV/Markdown
python -m analysis.summarize_benchmarks \
--roots outputs/benchmarks/breast_cancer_d4 outputs/benchmarks/breast_cancer_d6 \
--out outputs/benchmarks/summary_all.csv \
--md outputs/benchmarks/summary_all.md- Plot vs d curves (concentration / effective rank / accuracy)
python -m analysis.plot_vs_d \
--summary outputs/benchmarks/summary_all.md \
--out figs/checkpoint3/vs_d \
--also-p95- Delta analysis (vs baseline) + heatmaps + tradeoffs
python -m analysis.plot_deltas \
--summary outputs/benchmarks/summary_all.csv \
--out figs/checkpoint3/deltas- Checkpoint-specific comparison figures (optional)
python -m analysis.make_checkpoint2_figure \
--baseline <baseline_K.npy> \
--local <local_K.npy> \
--multiscale <multiscale_K.npy> \
--out figs/checkpoint2/breast_cancer_compare.pngYou can also run multiple configs and build per-dataset + global summaries:
python -m scripts.run_all_benchmarks --configs configs/breast_cancer_d4.toml configs/breast_cancer_d6.tomloutputs/benchmarks/<dataset>_d*/<case>_K.npyoutputs/benchmarks/<dataset>_d*/<case>_K_centered.npy(optional)outputs/benchmarks/<dataset>_d*/<case>_meta.jsonoutputs/benchmarks/<dataset>_d*/<case>_splits.jsonoutputs/benchmarks/<dataset>_d*/metrics.csv
Generated by analysis.diagnostics or run_experiment.py:
*_matrix.png(kernel heatmap)*_offdiag_hist.png(off-diagonal histogram)*_spectrum.png(eigen-spectrum)
Generated by analysis.summarize_benchmarks:
summary.csvsummary.md- (global)
summary_all.csv/summary_all.md
Generated by analysis.plot_vs_d:
<dataset>_concentration_p50_vs_d.png<dataset>_concentration_p95_vs_d.png<dataset>_effrank_vs_d.png<dataset>_testacc_vs_d.png<dataset>_vs_d_curves.csv
Generated by analysis.plot_deltas:
delta_by_d.csvdelta_by_dataset.csvdelta_delta_test_acc_<dataset>.pngdelta_delta_offdiag_p50_<dataset>.pngdelta_delta_eff_rank_<dataset>.pngtradeoff_<dataset>.pngbar_mean_delta_test_acc.pngheatmap_delta_test_acc_local.pngheatmap_delta_test_acc_multiscale.png
Checkpoint-specific figures and narratives live in docs/checkpoints/.
For large datasets, run_experiment.py supports Nyström/landmark approximation to avoid full
[nystrom]
enabled = true
datasets = ["star_classification", "exam_score_prediction"]
n_landmarks = 1000
diag_samples = 2000
chunk_size = 256In Nyström mode, the pipeline builds cross-kernels and evaluates a linear SVM on explicit features (eval_linear_features). This is optional and can be skipped if the full kernel is feasible.
Supported datasets in scripts/demo_common.py:
make_circles,irisbreast_cancer,parkinsonsexam_score_prediction,star_classificationionosphere,heart_disease
Some large datasets are run as subsets for tractability; see configs/ and checkpoint notes for details.
- Kernel not PSD: use
float64, enforce symmetry(K + K.T)/2, add small diagonal reg. - Slow runs: keep depth 1-2, reduce
n, prefer statevector first. - Mismatched splits: always reuse the
*_splits.jsonproduced by each run. - Windows paths: all scripts resolve config/output paths relative to repo root.
License: MIT (see LICENSE).
If you use this code, please cite the project and relevant QML/quantum-kernel references.
- C-grid: Discrete grid of SVM C values used for validation selection.
- MCC: Matthews Correlation Coefficient (robust classification metric).
- Nyström: Low-rank kernel approximation using landmarks.
- PCA: Principal Component Analysis (optional dimensionality reduction).
- PSD: Positive Semi-Definite.
- RDM: Reduced Density Matrix.
- SVM: Support Vector Machine.
- TOML: Simple, human-readable configuration format.
Status notice: This README reflects the current state of the project and will continue evolving as results and experiments expand.
See PLAN.md for scope, shared interfaces, artifacts, and milestones.