Skip to content

sadlavaarsc/RepoMind

Repository files navigation

RepoMind


Python FastAPI FAISS Pydantic OpenAI SDK MCP License

📖 中文文档📝 Changelog

A highly token-efficient code-aware RAG system specialized for repository understanding.

~80% token reduction vs naive RAG on large codebases, while maintaining comparable accuracy.


HighlightsPerformanceUse CasesDeep DiveArchitectureQuick StartResultsModulesMetrics


🔥 Highlights

  • Specialized Code Chunking: AST-aware file/class/function/block chunking with structured data extraction
  • LLM Summary Generation: Auto-generated chunk summaries during indexing for better retrieval quality
  • Multi-stage Retrieval Pipeline: Query expansion + vector search + metadata filtering + reranking
  • Chinese Optimization: n-gram matching with meaningless pronoun exclusion
  • Token Efficiency: ~80% token reduction vs non-optimized RAG on large repos (14100 → 1634 tokens)
  • Dual Model Strategy: Fast model for simple questions, strong model for complex questions, ensuring accuracy while optimizing cost and latency
  • Extensible Architecture: Vector storage abstraction layer for future migration
  • FastAPI & MCP Support: Production-ready API and Model Context Protocol for easy integration

📊 Performance / Key Insight

The Trade-off

Approach Recall Cost
Naive RAG High Very High (full files)
RepoMind Comparable ~80% lower (summaries + structured data)

Key Results

  • Small repos: Comparable or slightly better accuracy than naive RAG
  • Large repos: ~5-10% lower accuracy in single-query setting, but massive token savings
  • Token reduction: ~88% on medium-large projects (14100 → 1634 tokens), ~21% on small projects (3163 → 2502 tokens)

See full baseline results below for detailed metrics.

🎯 Use Cases

  • AI Agent Context Provider: Integrate with Claude Desktop or other AI tools via MCP to provide codebase context with minimal token overhead
  • Large Repo Exploration: Efficiently navigate and understand internal tools or niche open-source projects without sending entire files to LLMs
  • Team Knowledge Base: Help new team members onboard faster by answering codebase questions with grounded, verifiable answers

🔧 Deep Dive

1. Chunker Design: Multi-level Chunking

Challenge: Balancing granularity and context for optimal retrieval

Solution:

  • File-level: Whole module overview with imports and top-level structure
  • Class-level: Class responsibilities and methods
  • Function-level: Function inputs, outputs, and call relationships
  • Block-level: Code blocks in script files

Trade-offs: Finer granularity improves precision but may lose context; solved with low-cost fast LLM-generated summaries that preserve context while keeping individual chunks focused.

2. Reranker Design: Multi-factor Optimization

Challenge: Chinese queries require different handling, and diversity matters in retrieval results

Solution:

  • Chinese n-gram Matching: 2-gram + 3-gram for better Chinese keyword matching
  • Meaningless Word Filter: Exclusion table for Chinese pronouns ("我", "我们", "你", "你们", etc.)
  • Bucket Guarantee: At least 1 document chunk + 1 code chunk to ensure diversity
  • MMR Diversity: Maximal Marginal Relevance for result diversity
  • Weight Tuning: alpha=0.85 (cosine similarity), beta=0.15 (keyword score) - keywords as "icing on the cake"

3. Token Efficiency Optimization

Challenge: Reducing token usage while maintaining answer quality

Solution:

  • LLM Summaries: Use low-cost fast LLM (default: qwen-flash) to generate concise summaries instead of sending full code
  • Dual Model Strategy: Simple questions use fast model (qwen-flash), complex questions use strong model (qwen3.5-plus), saving cost and optimizing response speed
  • Structured Data: Extract imports, signatures, calls instead of using full code
  • Smart Context Packing: Prioritize summary > structured data > code

🏗️ System Architecture

graph TD
    A[Query Input] --> B[Query Expansion<br/>MQE]
    B --> C[Query Classification<br/>Simple/Complex]
    C --> D[Multi-stage Retrieval Pipeline]

    subgraph D_Pipeline[Multi-stage Retrieval Pipeline]
        D1[1. Vector Retrieval<br/>FAISS Top 20]
        D2[2. Bucket Guarantee<br/>Docs + Code]
        D3[3. Keyword Scoring<br/>Chinese n-gram]
        D4[4. MMR Reranking<br/>Diversity]
        D5[5. Final Selection<br/>Top 5]
    end

    D --> D1
    D1 --> D2
    D2 --> D3
    D3 --> D4
    D4 --> D5

    D5 --> E[Context Building]

    subgraph Context[Context Building]
        E1[Chunk Summary<br/>LLM Generated]
        E2[Structured Data<br/>imports, signatures, calls]
        E3[Raw Code<br/>Optional]
    end

    E --> E1
    E --> E2
    E --> E3

    E --> F[Answer Generation<br/>Dual Model Strategy]

    subgraph Gen[Answer Generation]
        F1[Simple Question<br/>qwen-flash]
        F2[Complex Question<br/>qwen3.5-plus]
    end

    C -->|Simple| F1
    C -->|Complex| F2

    F1 --> G[Answer Output]
    F2 --> G
Loading

🚀 Quick Start

Environment Requirements

  • Python 3.9+
  • Conda environment: RepoMind

Installation

conda create -n RepoMind python=3.11
pip install -r requirements.txt

Configuration

Copy .env.example to .env and configure:

cp .env.example .env
# Edit .env file, set QWEN_API_KEY

Core Interface (Recommended)

Use the unified RepoMind class with all configurable options:

from repomind import RepoMind

# Initialize with default configuration
repomind = RepoMind()

# Or with custom configuration
repomind = RepoMind(
    enable_query_expansion=True,      # Enable query expansion (MQE)
    enable_query_classification=True,  # Enable question classification
    query_expansion_variants=2,         # Number of query expansion variants
    use_fast_llm_for_expansion=True,    # Use fast LLM for query expansion
    use_hybrid_answer_generation=True,  # Hybrid answer generation (fast for simple)
)

# Index a repository
repomind.index_repository("/path/to/repo")

# Query
result = repomind.query("What does this project do?")
print(result["answer"])

Run Demo

conda activate agentEnv && python scripts/test_core.py

Start API Service

conda activate agentEnv && uvicorn repomind.api.main:app --reload

Index Repository

POST /index
{
  "repo_path": "/path/to/repository"
}

Query Repository

POST /query
{
  "question": "What does this project do?"
}

Full API documentation: http://localhost:8000/docs

Start MCP Service

RepoMind supports MCP (Model Context Protocol) for integration with Claude Desktop, Claude Code, and other AI tools:

conda activate agentEnv && python scripts/start_mcp_server.py

MCP Tools:

  • index_repository(repo_path) - Index a code repository
  • query_repository(question) - Query an indexed repository
  • get_health() - Check service health
  • save_index(index_path) - Save index to disk
  • load_index(index_path) - Load index from disk

Claude Desktop Configuration: Add to Claude Desktop config:

{
  "mcpServers": {
    "repomind": {
      "command": "conda",
      "args": ["run", "-n", "RepoMind", "python", "/path/to/RepoMind/scripts/start_mcp_server.py"]
    }
  }
}

📦 Core Modules

See docs/MODULES.md.

📈 Baseline Results

Test Projects

For evaluation metrics, see docs/METRICS.md. Tested projects:

  1. travel_agent (small): LLM-based travel assistant agent (see 测试仓库/)
  2. cuezero (medium-large): High-performance billiards AI system (https://github.com/sadlavaarsc/CueZero)

Tested Systems

System Description
LLM-only No retrieval (specific files provided as context, with necessary truncation for large files to save cost)
Naive RAG Non-optimized generic RAG implementation, using file-level chunks to avoid recall degradation from fragmented splitting
Structured RAG Complete ingestion pipeline + naive retrieval + naive rerank
Full System Full optimization (qwen3.5-plus)
Full System Fast Full optimization + dual model strategy (qwen-flash + qwen3.5-plus)

travel_agent Results

System Avg Recall Avg Hit Rate Answerable Rate E2E Success Rate Avg Correctness Avg Grounding Avg Total Token Avg Latency(ms)
llm_only 0.000 0.000 0.0% 40.0% 2.00 0.80 3136 14463.6
naive_rag 1.000 1.000 90.0% 100.0% 2.00 2.00 3163 12789.5
structured_rag 0.975 1.000 80.0% 100.0% 2.00 2.00 2686 13869.1
full_system 0.975 1.000 90.0% 100.0% 2.00 2.00 2845 37362.6
full_system_fast 0.975 1.000 90.0% 100.0% 2.00 2.00 2502 15157.2

cuezero Results

System Avg Recall Avg Hit Rate Answerable Rate E2E Success Rate Avg Correctness Avg Grounding Avg Total Token Avg Latency(ms)
llm_only 0.000 0.000 0.0% 50.0% 2.00 1.00 3590 21760.5
naive_rag 0.500 1.000 100.0% 100.0% 2.00 2.00 14100 15034.3
structured_rag 0.400 0.900 70.0% 70.0% 1.70 2.00 3420 20691.7
full_system 0.450 1.000 100.0% 80.0% 1.70 2.00 2313 48915.8
full_system_fast 0.450 1.000 100.0% 90.0% 1.80 2.00 1634 14342.8

Average latency may be slightly higher due to network reasons. Actual performance can be referenced based on actual business conditions and llm_only values. This is for comparison purposes only.


📁 Project Structure

repomind/
├── repomind/
│   ├── ingestion/          # Data parsing and preprocessing
│   ├── indexing/           # Embedding and vector indexing
│   ├── storage/            # Vector storage abstraction
│   ├── retrieval/          # Multi-stage retrieval pipeline
│   ├── generation/         # LLM answer generation
│   ├── evaluation/         # Evaluation metrics
│   ├── api/                # FastAPI service
│   ├── mcp/                # MCP service
│   ├── configs/            # Configuration management
│   ├── baselines/          # Baseline systems
│   └── core.py             # RepoMind core class
├── test_suite/             # Test suite
├── scripts/                # Utility scripts
├── tests/                  # Test suite
├── requirements.txt
├── README.md
├── README_zh.md
└── CHANGELOG.md

🛠️ Tech Stack

  • Vector Storage: FAISS (Facebook AI Similarity Search)
  • Embedding Model: text-embedding-v4
  • Strong LLM: qwen3.5-plus - for final answer generation
  • Fast LLM: qwen-flash - for query expansion, question classification, chunk summary generation, LLM evaluation
  • API Framework: FastAPI
  • Data Modeling: Pydantic v2

📝 Changelog

See CHANGELOG.md for a detailed history of changes.


📄 License

MIT License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages