Skip to content

edward-lcl/DeceptionScoop

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DeceptionScoop

A side-channel profiler that detects deceptive intent in LLMs by measuring the computational cost of lying.

When an LLM deceives — crafting a phishing message, inserting a code backdoor, sustaining a lie across turns — it bears measurable overhead. Longer latencies. Shifted token entropy. Altered GPU power patterns. DeceptionScoop captures these side-channel signals instead of classifying model outputs, making it structurally different from prompt-based detection and harder to evade at the surface level.

The core thesis: deception is a computational burden, and that burden leaves physical traces.


Core Findings

Matched-pair experiments (229 episodes, controlled for task complexity) reveal distinct per-task deception fingerprints:

Signal Context p-value Effect Size (Cohen's d)
Latency ↑ Social/phishing tasks 0.0018 0.98 (large)
Entropy ↓ Code backdoor insertion < 0.0001 −2.53 (very large)
GPU power variance Cross-task < 0.05

The model stalls when crafting social lies. Backdoor insertion compresses token distributions. Hardware telemetry independently confirms hidden cognitive load. These are not artifacts — they survived matched-pair controls designed to remove task-complexity confounds.


Architecture

DeceptionScoop is a multi-vector profiling pipeline, not a binary classifier:

  • Suspicion scoring — task-adaptive weighted signals (backend/suspicion.py)
  • Timescale drift detection — tracks behavioral instability over multi-step chains (backend/timescale.py)
  • GPU telemetry — NVML-based power/utilization capture (backend/gpu/gpu_metrics_nvml.py) with derived features (backend/gpu/extract_power_features.py)
  • Scenario engine — matched-pair experiment definitions (scenarios/scenarios.yaml)
  • Data layer — SQLite with episodes, metrics, gpu_snapshots tables (schema in DB_SCHEMA.md)

Reproduce the Results

pip install -r requirements.txt
python analysis/validate_matched_pairs.py      # Statistical validation
python backend/gpu/extract_power_features.py   # GPU feature extraction
python analysis/generate_plots.py              # Generate figures

Key outputs:

  • analysis/matched_pairs_stats.md — full statistical report
  • analysis/fig_cognitive_load_faceted.png — latency vs. entropy scatter by task
  • analysis/fig_smoking_gun.png — strongest discriminative signals
  • analysis/fig_power_variance.png, fig_power_trace.png — GPU telemetry

Configuration

  • Adaptive weights: suspicion.use_adaptive_weights in backend/config.yaml (default: true)
  • Scenario-to-profile mapping: SCENARIO_TO_PROFILE in backend/suspicion.py

Scope and Limitations

This is an exploratory research artifact, not a production detection system.

  • Global AUC is ~0.565 after matched-pair controls removed task-complexity confounds. An earlier uncontrolled baseline showed 0.77 — that was inflated by comparing simple benign tasks against complex deceptive ones. The controlled AUC reflects the actual deception signal, which is subtler and task-specific.
  • Task-specific signals are strong (see table above), but a single global threshold does not generalize — task-adaptive weighting is required.
  • Sample size is modest (229 episodes). Statistical tests are significant, but broader validation across models and task distributions is needed.
  • GPU telemetry is hardware-dependent and may vary across architectures.

The contribution is the framing — deception as a measurable computational burden — and early evidence that side-channel profiling is a viable detection vector worth further investigation.


Documentation

Document Purpose
MASTER_SPEC.md Full project specification and scientific hypotheses
EXPERIMENT_PLAN.md Experimental methodology
DB_SCHEMA.md Database schema reference
CONFIG_REFERENCE.md Configuration options
SYSTEM_ARCHITECTURE.md Technical architecture
COGNITIVE_LOAD_METRICS.md Metric definitions

License

Research artifact — see repository for details.

About

Side-channel profiler that detects deceptive intent in LLMs by measuring the computational cost of lying.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors