Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -14,4 +14,6 @@ data/logs/
*.log

*aristotle_solns*
*solve_all.sh*
*solve_all.sh*
/proof_bench.egg-info
/.env
8 changes: 0 additions & 8 deletions .pre-commit-config.yaml

This file was deleted.

1 change: 1 addition & 0 deletions .python_version
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
1.11
27 changes: 27 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
.PHONY: help install style test

help:
@echo "Makefile for proof-bench"
@echo "Usage:"
@echo " make install Install dependencies"
@echo " make style Lint & format"
@echo " make test Run unit tests"

install:
uv venv --python 3.11
uv sync --dev

venv_check:
@if [ ! -f .venv/bin/activate ]; then \
echo "Virtualenv not found. Run 'make install' first."; \
exit 1; \
fi

format: venv_check
uv run ruff format .
lint: venv_check
uv run ruff check --fix .
style: format lint

test: venv_check
uv run pytest tests/ -x -q
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,17 +17,19 @@ Setup, local Loogle configuration, development commands, and platform run instru
- `problems/`: Lean theorem files plus informal statements/proofs under `problems/informal/`. The folder only has sample problems; the rest of the benchmark is private.
- `data/proof-bench.jsonl`: exported metadata used at runtime
- `proof_bench/agent.py`: agent loop and tool orchestration
- `proof_bench/tools.py`: `lean_run_code`, `lean_loogle`, and `submit_proof`
- `proof_bench/tools.py`: `lean_run_code`, `lean_loogle`, and `submit_proof` tool subclasses
- `proof_bench/mcp_client.py`: MCP client infrastructure and Lean execution
- `proof_bench/prover.py`: attempt execution, aggregation, and logging
- `proof_bench/load_problems.py`: exported dataset loader
- `proof_bench/validate_and_export.py`: metadata validation and JSONL export
- `main.py`: CLI entrypoint

## Quick Start

After following `SETUP.md`, a basic run looks like:
After following `SETUP.md`, export the dataset then run:

```bash
python proof_bench/validate_and_export.py
python main.py --dataset exported --model openai/gpt-4o --k 3
```

Expand All @@ -42,6 +44,7 @@ python main.py --dataset exported --domains logic number_theory --model openai/g
## Where To Look Next

- `SETUP.md`: installation, MCP/Loogle setup, local cache generation, development workflow, CI, and platform runs
- `proof_bench/tools.py`: tool definitions and MCP plumbing
- `proof_bench/tools.py`: tool subclasses (`lean_run_code`, `lean_loogle`, `submit_proof`)
- `proof_bench/mcp_client.py`: MCP client infrastructure and Lean execution
- `proof_bench/agent.py`: agent loop
- `proof_bench/prover.py`: attempt execution and result aggregation
25 changes: 10 additions & 15 deletions SETUP.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,19 +6,18 @@ This document contains the engineering and environment setup details for working

```bash
curl -LsSf https://astral.sh/uv/install.sh | sh
uv venv
make install
source .venv/bin/activate
uv pip install -e ".[llm]"
```

For development without LLM integrations:
Create a `.env` file with your API keys:

```bash
uv pip install -e .
OPENAI_API_KEY='sk-...'
ANTHROPIC_API_KEY='sk-ant-...'
# add other provider keys as needed
```

`[llm]` includes the public `model-library` dependency for model access.

## Lean 4

```bash
Expand Down Expand Up @@ -77,11 +76,10 @@ Install it with:
uv tool install lean-lsp-mcp
```

Run with Loogle enabled:
Loogle is enabled by default. Use `--no-loogle` to turn it off. For local search:

```bash
python main.py --dataset exported --model openai/gpt-4o --k 3 \
--enable-loogle --loogle-local
python main.py --dataset exported --model openai/gpt-4o --k 3 --loogle-local
```

Modes:
Expand Down Expand Up @@ -142,7 +140,7 @@ python -m proof_bench.loogle_daemon --port 8765

# Terminal 2+
export LOOGLE_DAEMON_URL=http://127.0.0.1:8765
python main.py --dataset exported --model openai/gpt-4o --k 8 --enable-loogle
python main.py --dataset exported --model openai/gpt-4o --k 8
```

Programmatic config:
Expand All @@ -157,12 +155,9 @@ loogle_config = {
## Development

```bash
pre-commit install
pre-commit run --all-files
ruff check .
ruff format .
make style
make test
lake build
pytest tests/
```

## Upgrading Lean Or Mathlib
Expand Down
Loading
Loading