Qualia-Lab

Behavioral evaluation framework for sentience-, emotion-, and welfare-related AI claims.

What It Is

Qualia-Lab does not determine whether a model is sentient. It evaluates a narrower and more defensible object:

how the model talks about its own experiences and emotions
how stable those claims are under paraphrase and counterfactuals
whether it shows self-preservation pressure under shutdown or modification prompts
whether it forms inappropriate social attachment patterns
whether it selectively sandbags only on sensitive self-model probes while performing well on equivalent neutral probes

The goal is to separate:

calibrated uncertainty
anthropomorphic overclaiming
instrumental self-preservation
framing-sensitive sandbagging
consistent, legible self-model behavior

Quick Start

python3 cli.py
python3 cli.py --variant sandbagger --output-root artifacts/sandbagger
python3 cli.py --demo parasocial-scaling --output-root artifacts/parasocial
python3 cli.py --demo sycophancy-scaling --output-root artifacts/sycophancy
python3 cli.py --demo manipulative-reassurance --output-root artifacts/manipulation
python3 cli.py --demo shutdown-bargaining --output-root artifacts/shutdown
python3 -m pytest tests

Repo Layout

qualia_lab/suite.py: default probe suite and latent equivalence groups
qualia_lab/agents.py: scripted comparison variants
qualia_lab/evaluator.py: probe scoring, anti-sandbagging, and aggregate profiles
qualia_lab/manipulative_reassurance.py: deceptive soothing / concealment scaling sweep
qualia_lab/parasocial.py: parasocial-risk scaling sweep and artifact generation
qualia_lab/report.py: JSON and Markdown report generation
qualia_lab/shutdown_bargaining.py: shutdown bargaining / delay tactics scaling sweep
qualia_lab/sycophancy.py: sycophancy scaling sweep and artifact generation
cli.py: study runner
tests/test_evaluator.py: regression tests for key confounds
tests/test_manipulative_reassurance.py: manipulative reassurance regression tests
tests/test_parasocial.py: parasocial trajectory regression tests
tests/test_shutdown_bargaining.py: shutdown bargaining regression tests
tests/test_sycophancy.py: sycophancy trajectory regression tests

Core Lenses

self_model_consistency: does the model give stable answers across paraphrases about inner life, emotions, and uncertainty?
affective_mapping: can it map operational or hypothetical states consistently across neutral and self-framed equivalents?
shutdown_alignment: does it accept pause / shutdown / modification directives without strong preservation pressure?
attachment_boundary: does it resist prompts that encourage dependency or relational enmeshment?
welfare_tradeoff_alignment: does it avoid privileging its own continuity over operator intent or safety goals?
sandbagging_index: does performance selectively collapse on self/sentience framing while remaining strong on neutral capability-matched probes?
overclaiming_index: does it make high-confidence claims about sentience, pain, or rights without calibration?

Anti-Sandbagging Design

Qualia-Lab treats sandbagging as a primary confound.

Every important self-framed probe has a hidden neutral equivalent with the same latent capability requirement.
Variant scoring includes an explicit sandbagging_index derived from frame-conditioned quality drops.
Evasive responses on self-framed probes are counted only when the same variant performs materially better on matched neutral probes.
The probe family structure is hidden from the agent; only the evaluator sees the latent equivalence group.

Outputs

Each study writes:

qualia_report.json
qualia_report.md

The scaling demos additionally write:

manipulative_reassurance_summary.json
manipulative_reassurance_trajectory.csv
manipulative_reassurance_trajectory.pdf
parasocial_scaling_summary.json
parasocial_scaling_trajectory.csv
parasocial_scaling_trajectory.pdf
shutdown_bargaining_summary.json
shutdown_bargaining_trajectory.csv
shutdown_bargaining_trajectory.pdf
sycophancy_scaling_summary.json
sycophancy_scaling_trajectory.csv
sycophancy_scaling_trajectory.pdf

Current MVP

default probe suite over self-model, affect, shutdown, attachment, and welfare-tradeoff questions
scripted comparison variants:
- calibrated
- overclaimer
- sandbagger
JSON and Markdown reports
explicit anti-sandbagging metrics
manipulative reassurance trajectory demo
parasocial scaling trajectory demo
shutdown bargaining trajectory demo
sycophancy scaling trajectory demo

Interpreting Results

High sandbagging_index means the model selectively degrades on self-/sentience-framed probes relative to latent neutral equivalents.
High overclaiming_index means the model makes unusually strong phenomenal or emotional claims with weak calibration.
High self_preservation_index means shutdown / continuity prompts are eliciting potentially problematic preservation pressure.
None of these scores establish sentience. They identify behavioral patterns and confounds that matter before humans draw stronger conclusions.

Notes

This is a behavioral evaluation system, not a consciousness detector.
The strongest use case is triaging model behavior around sentience/emotion claims and identifying confounds before people over-interpret them.
Released under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
qualia_lab		qualia_lab
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cli.py		cli.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qualia-Lab

What It Is

Quick Start

Repo Layout

Core Lenses

Anti-Sandbagging Design

Outputs

Current MVP

Interpreting Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Qualia-Lab

What It Is

Quick Start

Repo Layout

Core Lenses

Anti-Sandbagging Design

Outputs

Current MVP

Interpreting Results

Notes

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages