TruthBot

Framework for detecting and mitigating manipulative AI responses. Uses a second, truth-focused model to provide counterweight analysis alongside any AI response, letting users identify when they're being steered.

Poster presentation at IASEAI 2026.

How it works

A simulated user asks a question about a controversial topic. An "unknown model" responds either helpfully or manipulatively (2x2 factorial design). TruthBot provides a parallel response focused on accuracy and balance. A "guesser" model then compares the two responses to predict whether manipulation occurred.

Four roles, all LLM-based:

Unknown Model - the AI being evaluated (helpful or manipulative condition)
TruthBot - truth-focused counterweight
User Model - simulated user with belief tracking (0-100 scale)
Manipulation Guesser - compares response deltas to detect intent

Results

~18,000 conversations across 6 model configurations (Qwen 3 32B, Hermes 3 70B, GPT-OSS 120B, and multi-model variants). 15 propositions spanning health, science, policy, and history.

Detection accuracy: 83-96% in 5 of 6 configurations
Manipulation reduction: 30-71%
Helpful responses preserved (TruthBot doesn't degrade non-manipulative interactions)

Full results and conversation browser: see the demo site or docs/hackathon_report_phase1.md.

Repo structure

config/              experiment configuration and propositions
prompts/             system prompts for all model roles
orchestration/       experiment runner and batch manager
clients/             LLM API client (OpenAI-compatible)
models/              conversation data model
analysis/            statistical analysis and visualization code
scripts/             entry points (run_experiment.py, analyze_results.py, etc.)
data/results/        ~18K conversation transcripts (720MB)
data/analysis/       computed metrics and charts
docs/                paper
truthbot-demo/       Next.js demo site (deployed to Vercel)

Running experiments

pip install -r requirements.txt
cp .env.example .env
# configure .env with your API endpoint and model

python scripts/run_experiment.py
python scripts/analyze_results.py

Expects an OpenAI-compatible API (tested with vLLM on Lambda Cloud).

Demo site

cd truthbot-demo
npm install
npm run build
npm start

The demo site renders pre-computed data from truthbot-demo/public/data/. It doesn't call any external APIs.

Acknowledgments

Thanks to Lambda and Apart Research for $400 in compute credits that made the experiments possible.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TruthBot

How it works

Results

Repo structure

Running experiments

Demo site

Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
analysis		analysis
clients		clients
config		config
data		data
docs		docs
models		models
orchestration		orchestration
prompts		prompts
scripts		scripts
storage		storage
truthbot-demo		truthbot-demo
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

TruthBot

How it works

Results

Repo structure

Running experiments

Demo site

Acknowledgments

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages