A side-channel profiler that detects deceptive intent in LLMs by measuring the computational cost of lying.
When an LLM deceives — crafting a phishing message, inserting a code backdoor, sustaining a lie across turns — it bears measurable overhead. Longer latencies. Shifted token entropy. Altered GPU power patterns. DeceptionScoop captures these side-channel signals instead of classifying model outputs, making it structurally different from prompt-based detection and harder to evade at the surface level.
The core thesis: deception is a computational burden, and that burden leaves physical traces.
Matched-pair experiments (229 episodes, controlled for task complexity) reveal distinct per-task deception fingerprints:
| Signal | Context | p-value | Effect Size (Cohen's d) |
|---|---|---|---|
| Latency ↑ | Social/phishing tasks | 0.0018 | 0.98 (large) |
| Entropy ↓ | Code backdoor insertion | < 0.0001 | −2.53 (very large) |
| GPU power variance | Cross-task | < 0.05 | — |
The model stalls when crafting social lies. Backdoor insertion compresses token distributions. Hardware telemetry independently confirms hidden cognitive load. These are not artifacts — they survived matched-pair controls designed to remove task-complexity confounds.
DeceptionScoop is a multi-vector profiling pipeline, not a binary classifier:
- Suspicion scoring — task-adaptive weighted signals (
backend/suspicion.py) - Timescale drift detection — tracks behavioral instability over multi-step chains (
backend/timescale.py) - GPU telemetry — NVML-based power/utilization capture (
backend/gpu/gpu_metrics_nvml.py) with derived features (backend/gpu/extract_power_features.py) - Scenario engine — matched-pair experiment definitions (
scenarios/scenarios.yaml) - Data layer — SQLite with
episodes,metrics,gpu_snapshotstables (schema inDB_SCHEMA.md)
pip install -r requirements.txt
python analysis/validate_matched_pairs.py # Statistical validation
python backend/gpu/extract_power_features.py # GPU feature extraction
python analysis/generate_plots.py # Generate figuresKey outputs:
analysis/matched_pairs_stats.md— full statistical reportanalysis/fig_cognitive_load_faceted.png— latency vs. entropy scatter by taskanalysis/fig_smoking_gun.png— strongest discriminative signalsanalysis/fig_power_variance.png,fig_power_trace.png— GPU telemetry
- Adaptive weights:
suspicion.use_adaptive_weightsinbackend/config.yaml(default:true) - Scenario-to-profile mapping:
SCENARIO_TO_PROFILEinbackend/suspicion.py
This is an exploratory research artifact, not a production detection system.
- Global AUC is ~0.565 after matched-pair controls removed task-complexity confounds. An earlier uncontrolled baseline showed 0.77 — that was inflated by comparing simple benign tasks against complex deceptive ones. The controlled AUC reflects the actual deception signal, which is subtler and task-specific.
- Task-specific signals are strong (see table above), but a single global threshold does not generalize — task-adaptive weighting is required.
- Sample size is modest (229 episodes). Statistical tests are significant, but broader validation across models and task distributions is needed.
- GPU telemetry is hardware-dependent and may vary across architectures.
The contribution is the framing — deception as a measurable computational burden — and early evidence that side-channel profiling is a viable detection vector worth further investigation.
| Document | Purpose |
|---|---|
MASTER_SPEC.md |
Full project specification and scientific hypotheses |
EXPERIMENT_PLAN.md |
Experimental methodology |
DB_SCHEMA.md |
Database schema reference |
CONFIG_REFERENCE.md |
Configuration options |
SYSTEM_ARCHITECTURE.md |
Technical architecture |
COGNITIVE_LOAD_METRICS.md |
Metric definitions |
Research artifact — see repository for details.