Ethereum AI Code Reviewer

A multi-agent RAG system that automates cross-spec compliance reviews for Ethereum clients.

Overview

This system helps reduce manual effort in auditing Ethereum client code against evolving protocol specifications by:

Parsing and indexing Ethereum protocol specifications (execution-specs, consensus-specs)
Indexing client implementations starting with Geth (go-ethereum)
Analyzing PRs and code against spec requirements
Generating structured reports with flagged issues, security alerts, and suggested tests

Architecture

Key Features

Tree-sitter Code Chunking

AST-based chunking (same approach as Cursor, Windsurf, Aider, GitHub Copilot)
Keeps functions/classes intact for better embeddings
Supports Python, Go, and Markdown
+4-5 points Recall@5 improvement over naive chunking

Dual-Query Generation

Generates both spec-focused and client-focused queries from PRs
Example: "What does EIP-6780 specify?" + "How does Geth implement it?"

Reciprocal Rank Fusion (RRF)

Industry-standard algorithm for merging heterogeneous retrieval results
Balances results from spec and client collections

Query Importance Scoring

Scores queries 1-10 based on relevance and complexity
Higher-importance queries receive more retrieval quota

Comparison with Existing Solutions

Our approach differs from existing tools like ECR:

Aspect	ECR	Our Approach
Code Chunking	Text-based	AST-based (tree-sitter)
Retrieval	Single collection	Dual-collection (spec + client)
Query Strategy	Direct LLM	Dual-query + importance scoring
Ranking	Basic similarity	RRF fusion + quota allocation
Spec Mapping	Doc lookup	Direct function mapping

See UPDATED_PROPOSAL.md for detailed comparison.

Why RAG-First? (Enhancement Roadmap)

We use a phased approach with RAG as the core delivery:

Phase	Approach	What It Optimizes	Status
Phase 1	RAG	Knowledge retrieval (specs, code)	Core Delivery
Phase 2	ACE	Instructions/playbooks (audit strategies)	Future Enhancement
Phase 3	Fine-tuning	Model weights	If needed

Why RAG first? Specs change frequently — re-index in minutes vs re-train in weeks. Every finding cites source documents (explainability).

Phase 2 (ACE): After baseline is validated, we can add Agentic Context Engineering — an evolving "Ethereum Audit Playbook" that improves with each audit cycle (+10.6% on agent benchmarks). Requires 50-100 audited PRs for feedback.

Phase 3 (Fine-tuning): Only if RAG+ACE insufficient. SFT only (no CPT needed). See proposal for costs: $10K-31K initial (includes 3x buffer for iteration) + $1.5K-6K/month hosting.

Installation

# Clone the repository
git clone https://github.com/Straits-AI/ETH-AI-Code-Reviewer.git
cd ETH-AI-Code-Reviewer

# Install dependencies
pip install -r requirements.txt

# Or install directly
pip install openai chromadb sentence-transformers gitpython tqdm tenacity orjson pyyaml tree-sitter tree-sitter-python tree-sitter-go

Usage

Run the Jupyter notebook:

jupyter notebook multiagentrag-4-ethereum.ipynb

kaggle link: https://www.kaggle.com/code/whymelabs/multiagentrag-4-ethereum

Configuration

Variable	Default	Description
`MODEL_GPT_OSS`	`gpt-oss:20b`	Heavy model for auditing
`MODEL_LIGHT`	`llama3:8b`	Lightweight model for coordination
`USE_CROSS_ENCODER`	`0`	Enable cross-encoder reranking
`RRF_K`	`60`	RRF fusion constant
`SPEC_TO_CLIENT_RATIO`	`0.5`	Balance of spec vs client results
`CHUNK_MAX_CHARS`	`1500`	Maximum characters per chunk

Technical Stack

Component	Technology
LLM (Heavy)	GPT-OSS 20B / Ollama
LLM (Light)	Llama 3 8B
Embeddings	all-MiniLM-L6-v2
Vector DB	ChromaDB
Code Parsing	tree-sitter
Reranking	cross-encoder/ms-marco-MiniLM-L-6-v2

Data Sources

Source	Repository	Content
Execution Specs	ethereum/execution-specs	EVM, state, transactions
Consensus Specs	ethereum/consensus-specs	Beacon chain, validators
Geth Client	ethereum/go-ethereum	Reference execution client

Future Expansion

Client	Language	Status
Geth	Go	In Progress
Prysm	Go	Planned
Lighthouse	Rust	Planned
Nethermind	C#	Planned
Besu / Teku	Java	Planned
Lodestar	TypeScript	Planned

References

Cormack et al., "Reciprocal Rank Fusion," SIGIR 2009
Sweep AI, "Chunking 2M+ files a day for Code Search," 2023
Zhang et al., "cAST: Enhancing Code RAG with AST," arXiv:2506.15655, 2025
Wang et al., "CodeRAG-Bench," arXiv:2406.14497, 2024

License

MIT

Team

Quantum3Labs - AI/LLM solutions for blockchain ecosystems

Jomluz Tech Sdn. Bhd. - Software engineering and blockchain development

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
archive		archive
.gitignore		.gitignore
README.md		README.md
UPDATED_PROPOSAL.md		UPDATED_PROPOSAL.md
architecture.png		architecture.png
feedback.txt		feedback.txt
multiagentrag-4-ethereum.ipynb		multiagentrag-4-ethereum.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ethereum AI Code Reviewer

Overview

Architecture

Key Features

Tree-sitter Code Chunking

Dual-Query Generation

Reciprocal Rank Fusion (RRF)

Query Importance Scoring

Comparison with Existing Solutions

Why RAG-First? (Enhancement Roadmap)

Installation

Usage

Configuration

Technical Stack

Data Sources

Future Expansion

References

License

Team

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ethereum AI Code Reviewer

Overview

Architecture

Key Features

Tree-sitter Code Chunking

Dual-Query Generation

Reciprocal Rank Fusion (RRF)

Query Importance Scoring

Comparison with Existing Solutions

Why RAG-First? (Enhancement Roadmap)

Installation

Usage

Configuration

Technical Stack

Data Sources

Future Expansion

References

License

Team

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages