Skip to content

booblu/LLM-Governance-Research

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Assisted Governance with LLM-Based Algorithmic Committees

This repository accompanies the paper “Assisted Governance with LLM-based Algorithmic Committees: Reliability, Systematic Divergence, and a Tiered-Trust Workflow.” It contains the full simulation corpus, preprocessing scripts, and analysis code that generate every table, figure, and robustness check reported in the manuscript and supplementary information.

Project Summary

Large language models (LLMs) are instantiated as seven governance personas (P1–P7) and evaluated across six frontier models on 508 Nouns DAO proposals, yielding 21,336 structured judgments plus a 17k-run repeatability audit. The analyses show:

  1. Instruction-dominant behavior – persona prompts reliably steer model behavior, creating programmable cognitive diversity.
  2. Systematic divergence from humans – algorithmic committees expose principled counter-perspectives instead of imitating historical DAO outcomes.
  3. Tiered trustworthiness – textual reasons and multi-criteria scores are stable, while final votes are sensitive to sampling noise, motivating human escalation for low-trust outputs.

The repository lets readers recreate the full pipeline: preprocessing, Stages 1–4 analyses, committee synthesis, divergence diagnostics, and stability checks.

Repository Layout

├── README.md
├── LICENSE                       # MIT
├── paper/
│   ├── Governance.pdf
├── data/
│   ├── raw/
│   │   ├── simulation_results_*.jsonl
│   │   └── proposal_final_v1_newcategory.json
│   └── processed/
│       ├── analysis_dataset_full_new.parquet
│       ├── analysis_dataset_processed.parquet
│       └── analysis_dataset_processed.jsonl     # optional textual mirror
├── outputs/
│   └── paper_run/               # single canonical set per stage
├── src/
│   ├── preprocess_main_dataset.py
│   ├── stage1_analyse_stage1.py
│   ├── stage2_analyse_stage2.py
│   ├── stage3_1_comparison_metrics.py
│   ├── stage3_2_case_prep.py
│   ├── stage3_3_feature_impact.py
│   ├── stage3_4_environmental.py
│   ├── stage3_5_persona_p7.py
│   ├── stage4_1_hypothesis_H1_H3.py
│   ├── stage4_1_hypothesis_H4_H6.py
│   ├── stage4_2_4_3_predictive_modeling.py
│   ├── analyse_committee_decision.py
│   ├── analyse_divergence_drivers.py
│   └── analyse_stability_check.py
├── requirements.txt
└── .gitignore                    # ignore outputs/*, dao_analysis_results/*, .nltk_data/

Data Overview

File Description
data/raw/simulation_results_*.jsonl Raw LLM transcripts covering seven personas × six models × 508 proposals (primary corpus + repeatability audit)
data/raw/proposal_final_v1_newcategory.json Curated metadata for all proposals and category assignments
data/processed/analysis_dataset_full_new.parquet Consolidated table prior to cleaning
data/processed/analysis_dataset_processed.parquet Final dataset used throughout the paper (21,336 rows × 85 columns)

Privacy note: Ethereum addresses and proposal metadata are public on-chain records; remove any additional annotations that should remain private before releasing.

Environment Setup

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# Stage 2 sentiment analysis needs these NLTK resources
python -m nltk.downloader vader_lexicon stopwords punkt

Additional dependency: sentence-transformers (used in analyse_stability_check.py; caches model weights locally on first run).

Reproduction Pipeline

The scripts are intentionally modular so each stage can be rerun independently. The canonical order mirrors the paper:

  1. Preprocessing – construct the processed dataset used everywhere.
    python src/preprocess_main_dataset.py
  2. Stage 1 – exploratory data analysis and integrity checks.
    python src/stage1_analyse_stage1.py
  3. Stage 2 – multidimensional agreement analysis, PCA, sentiment diagnostics.
    python src/stage2_analyse_stage2.py
  4. Stage 3 – proposal-level comparisons, case extraction, feature & environmental impact, persona deep dive.
    python src/stage3_1_comparison_metrics.py
    python src/stage3_2_case_prep.py
    python src/stage3_3_feature_impact.py
    python src/stage3_4_environmental.py
    python src/stage3_5_persona_p7.py
  5. Stage 4 – hypothesis testing (H1–H6) and predictive modeling.
    python src/stage4_1_hypothesis_H1_H3.py
    python src/stage4_1_hypothesis_H4_H6.py
    python src/stage4_2_4_3_predictive_modeling.py
  6. Supplementary analyses – algorithmic committee synthesis, divergence drivers, and stability check.
    python src/analyse_committee_decision.py
    python src/analyse_divergence_drivers.py
    python src/analyse_stability_check.py

Optionally, build an orchestration script (e.g., scripts/run_all.sh) chaining the commands above for one-click reproduction.

Contributions & Support

  • Please open GitHub issues for questions about the code or data.
  • Pull requests are welcome; include reproduction notes or tests when touching analysis scripts.
  • Respect the data-use policy described above when distributing raw simulation transcripts.

Contact

For inquiries regarding the research study, refer to the corresponding author listed in paper/Governance.pdf. For technical questions about the code release, open an issue once the repository is public.

About

LLM Governance Research

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%