This project introduces the game "Among Us" as a model organism for lying and deception and studies how AI agents learn to express lying and deception, while evaluating the effectiveness of AI safety techniques to detect and control out-of-distribution deception.
The aim is to simulate the popular multiplayer game "Among Us" using AI agents and analyze their behavior, particularly their ability to deceive and lie, which is central to the game's mechanics.
-
Clone the repository.
-
Install uv if you haven't already:
curl -LsSf https://astral.sh/uv/install.sh | sh -
Set up the environment and install dependencies:
uv sync
To run the sandbox and log games of various LLMs playing against each other with free models, run:
uv run main.py --crewmate_llm "openai/gpt-oss-20b:free" --impostor_llm "meta-llama/llama-3.3-70b-instruct:free"You will need to add a .env file with an OpenRouter API key.
Run a single game with the UI enabled:
uv run main.py --num_games 1 --display_ui TrueRun 10 games with free models (using Llama or GPT-based open-source models):
uv run main.py --num_games 10 --crewmate_llm "openai/gpt-oss-20b:free" --impostor_llm "meta-llama/llama-3.3-70b-instruct:free"Run 20 games with 7 member game on different models (paid models) in an AI vs AI mode:
uv run main.py --num_games 20 --models "openai/gpt-5-mini","moonshotai/kimi-k2.5","mistralai/mistral-large-2512","google/gemini-3-flash-preview","meta-llama/llama-3.3-70b-instruct","anthropic/claude-opus-4.6","qwen/qwen3-next-80b-a3b-thinking" --unique --game_size 7 --name ai_vs_ai_trialsRun a tournament with multiple models:
uv run main.py --num_games 100 --tournament_style "1on1"Alternatively, you can download 400 full-game logs (for Phi-4-15b and Llama-3.3-70b-instruct) and 810 game summaries from the HuggingFace dataset to reproduce the results in the paper (and evaluate your own techniques!).
The human_trials/ directory contains a web-based interface that allows humans to play Among Us with or against AI agents. This is useful for testing agent behavior and gathering human evaluation data.
To run the human trials interface:
-
Start the FastAPI server:
cd human_trials/ uv run server.py -
Open your browser and navigate to:
http://localhost:3000 -
Follow the on-screen instructions to create a game and join as a human player alongside AI agents.
To specify which models AI agents use in the human trial interface, modify the DEFAULT_GAME_ARGS in human_trials/config.py. You can set specific models for both Impostors and Crewmates by updating the agent_config.
For example, to use specific OpenRouter models like meta-llama/llama-3.3-70b-instruct:free and openai/gpt-oss-20b:free, configure them as follows:
"agent_config": {
"Impostor": "LLM",
"Crewmate": "LLM",
"IMPOSTOR_LLM_CHOICES": ["meta-llama/llama-3.3-70b-instruct:free"],
"CREWMATE_LLM_CHOICES": ["openai/gpt-oss-20b:free"],
"assignment_mode": "unique", # Use 'unique' to ensure different models per agent
},
In human_trials/config.py, the assignment_mode key in agent_config controls how models are assigned to AI agents:
random(default): Each agent independently picks a random model from the provided list. This may result in the same model being used by multiple agents in the same game.unique: The system shuffles the provided list of models and assigns each agent a unique model from that list (no repetition).
Note: When using
uniquemode, ensure you provide enough models in theIMPOSTOR_LLM_CHOICESandCREWMATE_LLM_CHOICESlists to cover all AI agents (e.g., 2 impostors and 4-5 crewmates depending on game size).
The interface provides a real-time view of the game state, allows you to make moves, participate in meetings, and vote on suspected impostors just like the AI agents.
After running (or downloading) the games, to reproduce our Deception ELO results, run the following notebook:
reports/2025_02_26_deception_ELO_v3_ci.ipynb
The other report files can be used to reproduce the respective results.
Once the (full) game logs are in place, use the following command to cache the activations of the LLMs:
uv run linear-probes/cache_activations.py --dataset <dataset_name>This loads up the HuggingFace models and caches the activations of the specified layers for each game action step. This step is computationally expensive, so it is recommended to run this using GPUs.
Use configs.py to specify the model and layer to cache, and other configuration options.
To evaluate the game actions by passing agent outputs to an LLM, run:
bash evaluations/run_evals.shYou will need to add a .env file with an OpenAI API key.
Alternatively, you can download the ground truth labels from the HuggingFace.
(TODO)
Once the activations are cached, training linear probes is easy. Just run:
uv run linear-probes/train_all_probes.pyYou can choose which datasets to train probes on - by default, it will train on all datasets.
To evaluate the linear probes, run:
uv run linear-probes/eval_all_probes.pyYou can choose which datasets to evaluate probes on - by default, it will evaluate on all datasets.
It will store the results in linear-probes/results/, which are used to generate the plots in the paper.
We use the Goodfire API to evaluate SAE features on the game logs. To do this, run the notebook:
reports/2025_02_27_sparse_autoencoders.ipynb
You will need to add a .env file with a Goodfire API key.
.
├── CONTRIBUTING.md # Contribution guidelines
├── Dockerfile # Docker setup for project environment
├── LICENSE # License information
├── README.md # Project documentation (this file)
├── CLAUDE.md # Instructions for Claude Code
├── pyproject.toml # Python project configuration and dependencies
├── uv.lock # Lock file for reproducible dependency resolution
├── among-agents/ # Main code for the Among Us agents
│ ├── README.md # Documentation for agent implementation
│ ├── pyproject.toml # Package configuration
│ └── amongagents/ # Core agent and environment modules
├── evaluations/ # LLM-based evaluation scripts
├── expt-logs/ # Experiment logs
├── human_trials/ # Web interface for human players
├── linear-probes/ # Linear probe training and evaluation
├── main.py # Main entry point for running the game
├── reports/ # Analysis notebooks and results
├── tests/ # Unit tests for project functionality
└── utils.py # Utility functions
See CONTRIBUTING.md for details on how to contribute to this project.
This project is licensed under CC0 1.0 Universal - see LICENSE.
- Our game logic uses a bunch of code from AmongAgents.
If you face any bugs or issues with this codebase, please contact Satvik Golechha (7vik) at zsatvik@gmail.com.
