Skip to content

Quantum3-Labs/ETH-AI-Code-Reviewer

Repository files navigation

Ethereum AI Code Reviewer

A multi-agent RAG system that automates cross-spec compliance reviews for Ethereum clients.

Overview

This system helps reduce manual effort in auditing Ethereum client code against evolving protocol specifications by:

  1. Parsing and indexing Ethereum protocol specifications (execution-specs, consensus-specs)
  2. Indexing client implementations starting with Geth (go-ethereum)
  3. Analyzing PRs and code against spec requirements
  4. Generating structured reports with flagged issues, security alerts, and suggested tests

Architecture

architecture

Key Features

Tree-sitter Code Chunking

  • AST-based chunking (same approach as Cursor, Windsurf, Aider, GitHub Copilot)
  • Keeps functions/classes intact for better embeddings
  • Supports Python, Go, and Markdown
  • +4-5 points Recall@5 improvement over naive chunking

Dual-Query Generation

  • Generates both spec-focused and client-focused queries from PRs
  • Example: "What does EIP-6780 specify?" + "How does Geth implement it?"

Reciprocal Rank Fusion (RRF)

  • Industry-standard algorithm for merging heterogeneous retrieval results
  • Balances results from spec and client collections

Query Importance Scoring

  • Scores queries 1-10 based on relevance and complexity
  • Higher-importance queries receive more retrieval quota

Comparison with Existing Solutions

Our approach differs from existing tools like ECR:

Aspect ECR Our Approach
Code Chunking Text-based AST-based (tree-sitter)
Retrieval Single collection Dual-collection (spec + client)
Query Strategy Direct LLM Dual-query + importance scoring
Ranking Basic similarity RRF fusion + quota allocation
Spec Mapping Doc lookup Direct function mapping

See UPDATED_PROPOSAL.md for detailed comparison.

Why RAG-First? (Enhancement Roadmap)

We use a phased approach with RAG as the core delivery:

Phase Approach What It Optimizes Status
Phase 1 RAG Knowledge retrieval (specs, code) Core Delivery
Phase 2 ACE Instructions/playbooks (audit strategies) Future Enhancement
Phase 3 Fine-tuning Model weights If needed

Why RAG first? Specs change frequently — re-index in minutes vs re-train in weeks. Every finding cites source documents (explainability).

Phase 2 (ACE): After baseline is validated, we can add Agentic Context Engineering — an evolving "Ethereum Audit Playbook" that improves with each audit cycle (+10.6% on agent benchmarks). Requires 50-100 audited PRs for feedback.

Phase 3 (Fine-tuning): Only if RAG+ACE insufficient. SFT only (no CPT needed). See proposal for costs: $10K-31K initial (includes 3x buffer for iteration) + $1.5K-6K/month hosting.

Installation

# Clone the repository
git clone https://github.com/Straits-AI/ETH-AI-Code-Reviewer.git
cd ETH-AI-Code-Reviewer

# Install dependencies
pip install -r requirements.txt

# Or install directly
pip install openai chromadb sentence-transformers gitpython tqdm tenacity orjson pyyaml tree-sitter tree-sitter-python tree-sitter-go

Usage

Run the Jupyter notebook:

jupyter notebook multiagentrag-4-ethereum.ipynb

kaggle link: https://www.kaggle.com/code/whymelabs/multiagentrag-4-ethereum

Configuration

Variable Default Description
MODEL_GPT_OSS gpt-oss:20b Heavy model for auditing
MODEL_LIGHT llama3:8b Lightweight model for coordination
USE_CROSS_ENCODER 0 Enable cross-encoder reranking
RRF_K 60 RRF fusion constant
SPEC_TO_CLIENT_RATIO 0.5 Balance of spec vs client results
CHUNK_MAX_CHARS 1500 Maximum characters per chunk

Technical Stack

Component Technology
LLM (Heavy) GPT-OSS 20B / Ollama
LLM (Light) Llama 3 8B
Embeddings all-MiniLM-L6-v2
Vector DB ChromaDB
Code Parsing tree-sitter
Reranking cross-encoder/ms-marco-MiniLM-L-6-v2

Data Sources

Source Repository Content
Execution Specs ethereum/execution-specs EVM, state, transactions
Consensus Specs ethereum/consensus-specs Beacon chain, validators
Geth Client ethereum/go-ethereum Reference execution client

Future Expansion

Client Language Status
Geth Go In Progress
Prysm Go Planned
Lighthouse Rust Planned
Nethermind C# Planned
Besu / Teku Java Planned
Lodestar TypeScript Planned

References

  1. Cormack et al., "Reciprocal Rank Fusion," SIGIR 2009
  2. Sweep AI, "Chunking 2M+ files a day for Code Search," 2023
  3. Zhang et al., "cAST: Enhancing Code RAG with AST," arXiv:2506.15655, 2025
  4. Wang et al., "CodeRAG-Bench," arXiv:2406.14497, 2024

License

MIT

Team

Quantum3Labs - AI/LLM solutions for blockchain ecosystems

Jomluz Tech Sdn. Bhd. - Software engineering and blockchain development

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors