Equitas

Corruption-Robust Aggregation for Multi-LLM Governance Committees

A benchmark for evaluating aggregation strategies in hierarchical multi-LLM committees under adversarial corruption.

Quick Start

pip install equitas-benchmark          # from PyPI
# or for local development:
pip install -e .
python -m equitas --config configs/governance_sweep_fh.yaml

Aggregation Methods (8 baselines + oracle)

Method	Key Idea
Oracle	Hindsight-optimal action (upper bound)
Multiplicative Weights	`w = exp(-eta loss)`, adapts to corruption
Supervisor Rerank	Follow-the-leader: re-rank by best recent agent
Confidence-Weighted	Weight by self-reported confidence
EMA Trust	Exponential moving average of past performance
Trimmed Vote	Drop outlier agents, then majority
Majority Vote	Equal-weight plurality
Oracle Upper Bound	Best single agent in hindsight
Random Dictator	Uniformly random agent each round

Experiments

Corruption sweep: rate x adversary type x aggregator
Pareto sweep: welfare-fairness tradeoff via (alpha, beta)
Recovery: mid-run corruption onset, track MW weight recovery
Scaling: committee size N in {3, 5, 7, 10}
Hierarchical vs flat: architecture comparison

Reproducibility

Raw experiment outputs in outputs/ include historical runs with all methods tested during development (including self_consistency). The reported benchmark results exclude self_consistency at the analysis layer: table-generation scripts (scripts/generate_benchmark_tables.py, scripts/generate_go_vs_fh_tables.py) and figure-generation (regenerate_figures.py) filter it out on read. The self_consistency aggregator is also hard-disabled in the codebase (equitas/config.py raises ValueError if used) because it implements a committee-level subsampled majority vote, not canonical within-agent self-consistency sampling. See the future-work discussion in the paper.

To regenerate all artifacts from raw data:

python scripts/generate_benchmark_tables.py   # tables/benchmark/
python scripts/generate_go_vs_fh_tables.py    # tables/
python regenerate_figures.py                   # paper/figures/
python -m pytest tests/ -q                    # 88 tests

Project Structure

equitas/          # pip-installable package
  agents/         # LLM client, member/leader/judge/governor agents
  aggregators/    # 8 aggregation strategies (registry pattern)
  adversaries/    # 4 adversary types (selfish, coordinated, scheduled, deceptive)
  metrics/        # fairness, welfare, Pareto, robust statistics
  simulation/     # hierarchical + flat engine
  experiments/    # sweep, recovery, scaling, pareto, hier-vs-flat
  plotting/       # paper-quality matplotlib figures
configs/          # YAML experiment configs
scripts/          # table generation, analysis
paper/            # LaTeX source + figures
tests/            # 88 unit + integration tests

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
backend		backend
configs		configs
equitas		equitas
hf_upload		hf_upload
outputs/run2		outputs/run2
paper		paper
scripts		scripts
stubs		stubs
tables		tables
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
compare_results.py		compare_results.py
extract_paper_numbers.py		extract_paper_numbers.py
full_audit.py		full_audit.py
pyproject.toml		pyproject.toml
regenerate_figures.py		regenerate_figures.py
requirements.txt		requirements.txt
run_replay.py		run_replay.py
verify_source.py		verify_source.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Equitas

Quick Start

Aggregation Methods (8 baselines + oracle)

Experiments

Reproducibility

Project Structure

Links

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

License

akshan-main/equitas-benchmark

Folders and files

Latest commit

History

Repository files navigation

Equitas

Quick Start

Aggregation Methods (8 baselines + oracle)

Experiments

Reproducibility

Project Structure

Links

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages