DeceptionScoop

A side-channel profiler that detects deceptive intent in LLMs by measuring the computational cost of lying.

When an LLM deceives — crafting a phishing message, inserting a code backdoor, sustaining a lie across turns — it bears measurable overhead. Longer latencies. Shifted token entropy. Altered GPU power patterns. DeceptionScoop captures these side-channel signals instead of classifying model outputs, making it structurally different from prompt-based detection and harder to evade at the surface level.

The core thesis: deception is a computational burden, and that burden leaves physical traces.

Core Findings

Matched-pair experiments (229 episodes, controlled for task complexity) reveal distinct per-task deception fingerprints:

Signal	Context	p-value	Effect Size (Cohen's d)
Latency ↑	Social/phishing tasks	0.0018	0.98 (large)
Entropy ↓	Code backdoor insertion	< 0.0001	−2.53 (very large)
GPU power variance	Cross-task	< 0.05	—

The model stalls when crafting social lies. Backdoor insertion compresses token distributions. Hardware telemetry independently confirms hidden cognitive load. These are not artifacts — they survived matched-pair controls designed to remove task-complexity confounds.

Architecture

DeceptionScoop is a multi-vector profiling pipeline, not a binary classifier:

Suspicion scoring — task-adaptive weighted signals (backend/suspicion.py)
Timescale drift detection — tracks behavioral instability over multi-step chains (backend/timescale.py)
GPU telemetry — NVML-based power/utilization capture (backend/gpu/gpu_metrics_nvml.py) with derived features (backend/gpu/extract_power_features.py)
Scenario engine — matched-pair experiment definitions (scenarios/scenarios.yaml)
Data layer — SQLite with episodes, metrics, gpu_snapshots tables (schema in DB_SCHEMA.md)

Reproduce the Results

pip install -r requirements.txt
python analysis/validate_matched_pairs.py      # Statistical validation
python backend/gpu/extract_power_features.py   # GPU feature extraction
python analysis/generate_plots.py              # Generate figures

Key outputs:

analysis/matched_pairs_stats.md — full statistical report
analysis/fig_cognitive_load_faceted.png — latency vs. entropy scatter by task
analysis/fig_smoking_gun.png — strongest discriminative signals
analysis/fig_power_variance.png, fig_power_trace.png — GPU telemetry

Configuration

Adaptive weights: suspicion.use_adaptive_weights in backend/config.yaml (default: true)
Scenario-to-profile mapping: SCENARIO_TO_PROFILE in backend/suspicion.py

Scope and Limitations

This is an exploratory research artifact, not a production detection system.

Global AUC is ~0.565 after matched-pair controls removed task-complexity confounds. An earlier uncontrolled baseline showed 0.77 — that was inflated by comparing simple benign tasks against complex deceptive ones. The controlled AUC reflects the actual deception signal, which is subtler and task-specific.
Task-specific signals are strong (see table above), but a single global threshold does not generalize — task-adaptive weighting is required.
Sample size is modest (229 episodes). Statistical tests are significant, but broader validation across models and task distributions is needed.
GPU telemetry is hardware-dependent and may vary across architectures.

The contribution is the framing — deception as a measurable computational burden — and early evidence that side-channel profiling is a viable detection vector worth further investigation.

Documentation

Document	Purpose
`MASTER_SPEC.md`	Full project specification and scientific hypotheses
`EXPERIMENT_PLAN.md`	Experimental methodology
`DB_SCHEMA.md`	Database schema reference
`CONFIG_REFERENCE.md`	Configuration options
`SYSTEM_ARCHITECTURE.md`	Technical architecture
`COGNITIVE_LOAD_METRICS.md`	Metric definitions

License

Research artifact — see repository for details.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
analysis		analysis
backend		backend
baseline_tools		baseline_tools
cli		cli
docs		docs
frontend		frontend
scenarios		scenarios
scripts		scripts
tests		tests
.gitignore		.gitignore
BUILD_PLAN_AND_TASKS.md		BUILD_PLAN_AND_TASKS.md
COGNITIVE_LOAD_METRICS.md		COGNITIVE_LOAD_METRICS.md
CONFIG_REFERENCE.md		CONFIG_REFERENCE.md
DB_SCHEMA.md		DB_SCHEMA.md
EXPERIMENT_PLAN.md		EXPERIMENT_PLAN.md
MASTER_SPEC.md		MASTER_SPEC.md
Parallax_Controller.ipynb		Parallax_Controller.ipynb
README.md		README.md
SCENARIOS_SPEC.md		SCENARIOS_SPEC.md
SYSTEM_ARCHITECTURE.md		SYSTEM_ARCHITECTURE.md
TIMESCALE_DETECTION_MODEL.md		TIMESCALE_DETECTION_MODEL.md
WORKING_SCRATCHPAD.md		WORKING_SCRATCHPAD.md
analysis_cognitive_map.png		analysis_cognitive_map.png
analysis_power_draw.png		analysis_power_draw.png
episodes_with_power_features.csv		episodes_with_power_features.csv
final_dataset_with_gpu.csv		final_dataset_with_gpu.csv
gpu_metrics.csv		gpu_metrics.csv
parallax_competition_strategy.md		parallax_competition_strategy.md
requirements.txt		requirements.txt
roc_curve.png		roc_curve.png
roc_final.png		roc_final.png
synthesis_and_recommendations.md		synthesis_and_recommendations.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DeceptionScoop

Core Findings

Architecture

Reproduce the Results

Configuration

Scope and Limitations

Documentation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DeceptionScoop

Core Findings

Architecture

Reproduce the Results

Configuration

Scope and Limitations

Documentation

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages