A multi-agent RAG system that automates cross-spec compliance reviews for Ethereum clients.
This system helps reduce manual effort in auditing Ethereum client code against evolving protocol specifications by:
- Parsing and indexing Ethereum protocol specifications (execution-specs, consensus-specs)
- Indexing client implementations starting with Geth (go-ethereum)
- Analyzing PRs and code against spec requirements
- Generating structured reports with flagged issues, security alerts, and suggested tests
- AST-based chunking (same approach as Cursor, Windsurf, Aider, GitHub Copilot)
- Keeps functions/classes intact for better embeddings
- Supports Python, Go, and Markdown
- +4-5 points Recall@5 improvement over naive chunking
- Generates both spec-focused and client-focused queries from PRs
- Example: "What does EIP-6780 specify?" + "How does Geth implement it?"
- Industry-standard algorithm for merging heterogeneous retrieval results
- Balances results from spec and client collections
- Scores queries 1-10 based on relevance and complexity
- Higher-importance queries receive more retrieval quota
Our approach differs from existing tools like ECR:
| Aspect | ECR | Our Approach |
|---|---|---|
| Code Chunking | Text-based | AST-based (tree-sitter) |
| Retrieval | Single collection | Dual-collection (spec + client) |
| Query Strategy | Direct LLM | Dual-query + importance scoring |
| Ranking | Basic similarity | RRF fusion + quota allocation |
| Spec Mapping | Doc lookup | Direct function mapping |
See UPDATED_PROPOSAL.md for detailed comparison.
We use a phased approach with RAG as the core delivery:
| Phase | Approach | What It Optimizes | Status |
|---|---|---|---|
| Phase 1 | RAG | Knowledge retrieval (specs, code) | Core Delivery |
| Phase 2 | ACE | Instructions/playbooks (audit strategies) | Future Enhancement |
| Phase 3 | Fine-tuning | Model weights | If needed |
Why RAG first? Specs change frequently — re-index in minutes vs re-train in weeks. Every finding cites source documents (explainability).
Phase 2 (ACE): After baseline is validated, we can add Agentic Context Engineering — an evolving "Ethereum Audit Playbook" that improves with each audit cycle (+10.6% on agent benchmarks). Requires 50-100 audited PRs for feedback.
Phase 3 (Fine-tuning): Only if RAG+ACE insufficient. SFT only (no CPT needed). See proposal for costs: $10K-31K initial (includes 3x buffer for iteration) + $1.5K-6K/month hosting.
# Clone the repository
git clone https://github.com/Straits-AI/ETH-AI-Code-Reviewer.git
cd ETH-AI-Code-Reviewer
# Install dependencies
pip install -r requirements.txt
# Or install directly
pip install openai chromadb sentence-transformers gitpython tqdm tenacity orjson pyyaml tree-sitter tree-sitter-python tree-sitter-goRun the Jupyter notebook:
jupyter notebook multiagentrag-4-ethereum.ipynbkaggle link: https://www.kaggle.com/code/whymelabs/multiagentrag-4-ethereum
| Variable | Default | Description |
|---|---|---|
MODEL_GPT_OSS |
gpt-oss:20b |
Heavy model for auditing |
MODEL_LIGHT |
llama3:8b |
Lightweight model for coordination |
USE_CROSS_ENCODER |
0 |
Enable cross-encoder reranking |
RRF_K |
60 |
RRF fusion constant |
SPEC_TO_CLIENT_RATIO |
0.5 |
Balance of spec vs client results |
CHUNK_MAX_CHARS |
1500 |
Maximum characters per chunk |
| Component | Technology |
|---|---|
| LLM (Heavy) | GPT-OSS 20B / Ollama |
| LLM (Light) | Llama 3 8B |
| Embeddings | all-MiniLM-L6-v2 |
| Vector DB | ChromaDB |
| Code Parsing | tree-sitter |
| Reranking | cross-encoder/ms-marco-MiniLM-L-6-v2 |
| Source | Repository | Content |
|---|---|---|
| Execution Specs | ethereum/execution-specs | EVM, state, transactions |
| Consensus Specs | ethereum/consensus-specs | Beacon chain, validators |
| Geth Client | ethereum/go-ethereum | Reference execution client |
| Client | Language | Status |
|---|---|---|
| Geth | Go | In Progress |
| Prysm | Go | Planned |
| Lighthouse | Rust | Planned |
| Nethermind | C# | Planned |
| Besu / Teku | Java | Planned |
| Lodestar | TypeScript | Planned |
- Cormack et al., "Reciprocal Rank Fusion," SIGIR 2009
- Sweep AI, "Chunking 2M+ files a day for Code Search," 2023
- Zhang et al., "cAST: Enhancing Code RAG with AST," arXiv:2506.15655, 2025
- Wang et al., "CodeRAG-Bench," arXiv:2406.14497, 2024
MIT
Quantum3Labs - AI/LLM solutions for blockchain ecosystems
Jomluz Tech Sdn. Bhd. - Software engineering and blockchain development
