Skip to content

uprockcom/code-index

Repository files navigation

Code Index

A standalone, AI-oriented code indexing framework that speeds up code discovery for AI agents (like Claude or Gemini). Think of it as the indexing engine behind an IDE like IntelliJ or Eclipse, but exposed as a standalone service.

  • Supported Languages: Java, Kotlin, Rust, Go, C, C++
  • Interfaces: MCP (stdio) for AI agents, JSON-RPC (HTTP) for custom scripts
  • Storage: Fast, local SQLite database

What it does

Instead of forcing AI agents to grep blindly through hundreds of files, Code Index parses your source code and builds a relational graph of your codebase.

An AI agent can instantly ask precise questions like:

  • "What methods does the UserController class have?"
  • "Find every function that calls processPayment()."
  • "What classes implement the PaymentGateway interface?"
  • "Show me the definition of the User struct."

It supports full cross-file resolution, tracks type hierarchies, and even updates incrementally in real-time as you modify files. Mixed-language projects (e.g., Chromium with C++ and Java, or Android apps with Java and Kotlin) are indexed into a single unified graph.

Features

  • Blazing Fast Symbol Lookup: Instantly jump to classes, functions, methods, traits, and macros.
  • Call Graphs & Type Hierarchies: Navigate up and down the call stack and inheritance trees.
  • Cross-References: Find all reads, writes, and usages of a specific symbol.
  • Mixed-Language Projects: All supported languages are indexed into one database — query across language boundaries in a single project.
  • Live File Watching: The index stays up-to-date automatically as you type and save files.
  • Full Text & Semantic Search: Search by exact keywords (FTS5) or use AI embeddings to search by natural language meaning.
  • Build-System Aware: Automatically detects source roots for Gradle, Maven, and Cargo projects, ignoring generated folders like build/ or target/.

Installation

You need the Rust toolchain installed.

# Clone the repository
git clone https://github.com/your-username/code-index.git
cd code-index

# Build the release binary with semantic search enabled
cargo build --release -p code-index-server --features semantic

# The binary will be available at target/release/code-index-server

(Note: The first build may take a moment as it compiles the Tree-sitter C parsers and bundled SQLite).

Usage: Connecting AI Agents (MCP)

The easiest way to use Code Index is by connecting it to an AI CLI tool using the Model Context Protocol (MCP). The configuration is exactly the same for both Gemini CLI and Claude CLI.

Create or update the appropriate settings file for your agent:

  • Gemini CLI: .gemini/settings.json (in your project root)
  • Claude CLI: .mcp.json (in your project root)

Add the following configuration:

{
  "mcpServers": {
    "code-index": {
      "command": "/absolute/path/to/code-index/target/release/code-index-server",
      "args": ["--transport", "stdio", "--root", "."]
    }
  }
}

The CLI will automatically detect the server and provide the indexing and search tools.

Setting up Semantic Search (Optional)

By default, Code Index uses exact-match and structural search. If you want the ability to search your codebase using natural language (e.g., "How is the user profile updated?"), you can enable Semantic Search.

Semantic Search requires an embedding provider. The server is compatible with both paid OpenAI models and free, locally-hosted models (like Ollama).

Using a Free Local Model (Ollama)

  1. Install Ollama on Linux or macOS:
    curl -fsSL https://ollama.com/install.sh | sh
  2. Download the local embedding model (nomic-embed-text is a small ~274MB model perfect for code):
    ollama pull nomic-embed-text
  3. Start the code-index-server with the following environment variables (you can put these in your .bashrc or export them before running your AI CLI):
    export CODE_INDEX_EMBEDDING_API_URL="http://localhost:11434/v1/embeddings"
    export CODE_INDEX_EMBEDDING_MODEL="nomic-embed-text"
    export CODE_INDEX_EMBEDDING_API_KEY="dummy"  # Required by the client, but ignored by Ollama
    export CODE_INDEX_EMBEDDING_DIMENSIONS="768"

Using OpenAI

Export your API key. The server will default to text-embedding-3-small.

export CODE_INDEX_EMBEDDING_API_KEY="sk-your-openai-key"

Note: If no API key is provided, the server will gracefully disable semantic search and continue providing fast structural and text searches.

Advanced Usage

Running as a standalone JSON-RPC Server

You can run the server directly and query it using HTTP/JSON-RPC (defaults to port 9120):

cargo run -p code-index-server -- --root /path/to/your/project

Brand / Flavor Configuration (Android)

For Android projects with build variants (flavors), use --brand and --source-rule to control which source sets get indexed:

{
  "mcpServers": {
    "code-index": {
      "command": "/absolute/path/to/code-index/target/release/code-index-server",
      "args": [
        "--transport", "stdio",
        "--root", ".",
        "--brand", "exampleBrand",
        "--source-rule", "common:src/main/java",
        "--source-rule", "brand:src/{brand}/java"
      ]
    }
  }
}

To switch brands at runtime without restarting, agents can use the set_brand MCP tool.

Large codebases

For projects with more than 10,000 source files, the default per-file indexing may be slow. Use --bulk to switch to a faster bulk indexing pipeline that resolves imports, calls, and type relations via SQL batch operations:

code-index index --root /path/to/project --bulk

Or in .mcp.json:

{
  "mcpServers": {
    "code-index": {
      "command": "code-index-server",
      "args": ["--transport", "stdio", "--root", ".", "--bulk"]
    }
  }
}

Benchmarks

We run ablation benchmarks to measure the impact of Code Index on AI agent performance. Each benchmark task is executed twice — once with MCP tools enabled, once without — to compare token usage, cost, and speed.

Methodology: Average of 3 runs, Claude CLI with Opus 4.6 (1M context).

Android project (~860 files, ~8000 symbols)

Task type With MCP Without MCP Winner
File symbol listing 18.4s, $0.13 69.4s, $0.30 MCP (3.8x faster, 2.3x cheaper)
Call graph traversal 44.5s, $0.24 79.2s, $0.30 MCP (1.8x faster, 1.3x cheaper)
Cross-module tracing 39.3s, $0.19 66.8s, $0.24 MCP (1.7x faster, 1.2x cheaper)
Find implementations 22.5s, $0.14 29.4s, $0.16 MCP (1.3x faster, 1.2x cheaper)
Find definition 20.3s, $0.23 16.8s, $0.21 Grep (faster and cheaper for simple lookup)

Quality: 100% both modes across all tasks. Overall: MCP 44% faster and 23% cheaper ($2.78 vs $3.63).

Chromium project (~86K files, ~1.8M symbols, C++ and Java)

Task type With MCP Without MCP Winner
C++ cross-module trace 314s, $0.50, 93% q 334s, $1.39, 100% q MCP (2.8x cheaper, similar speed)
C++ call graph 253s, $0.51, 93% q 344s, $0.61, 93% q MCP (1.4x faster, 1.2x cheaper)
C++ find definition 66s, $0.31, 100% q 84s, $0.39, 100% q MCP (1.3x faster, 1.3x cheaper)
C++ file symbols 268s, $0.89, 80% q 333s, $1.03, 100% q MCP faster/cheaper, grep better quality
C++ interface impls 232s, $0.45, 93% q 162s, $0.42, 60% q MCP better quality, grep faster
Java interface impls 50s, $0.16, 100% q 72s, $0.23, 100% q MCP (1.4x faster, 1.5x cheaper)
Java call graph 106s, $0.30, 83% q 111s, $0.35, 100% q MCP cheaper, grep better quality
Java find definition 49s, $0.15, 100% q 33s, $0.12, 100% q Grep (faster and cheaper)

Overall: MCP 28% cheaper ($9.81 vs $13.67) and 9% faster (4014s vs 4421s). Quality is similar (MCP 93% vs grep 94%) — MCP occasionally misses facts on complex exploration, while grep occasionally misses implementations that require structural traversal to find.

Summary

Metric Android (860 files) Chromium (86K files)
MCP speed advantage 44% faster 9% faster
MCP cost advantage 23% cheaper 28% cheaper
Quality identical (100%) similar (93% vs 94%)
MCP cost-effective wins 4/5 tasks 5/8 tasks

Takeaway: Code Index consistently reduces cost across both project sizes. The speed advantage is largest on the small project (44%) where grep has to search many files per query. On the large project, the cost savings are more significant (28%) because grep operations on 86K files are expensive. Quality is comparable — MCP sometimes explores too aggressively (missing facts in 26-turn call graph sessions), while grep sometimes misses implementations that require structural traversal.

When Code Index helps most

Best for:

  • Cost reduction on large codebases — grep on 86K files is expensive; structured index queries are cheap per call
  • Multi-hop structural queries — call graphs, cross-module tracing, type hierarchies
  • Completeness-sensitive tasks — finding all implementations of a widely-inherited class
  • Disambiguation — querying a specific connect method out of thousands of matches

Not needed for:

  • Simple "where is X defined?" lookups — grep is faster for single-hop queries
  • Tasks where answer quality is critical and exploration depth is unpredictable

Running your own benchmarks

Results and scripts are in tests/benchmark/.

./tests/benchmark/bench.sh --prompts prompts_example.txt --runs 3

See prompts_example.txt for the prompt format.

Contributing and Technical Details

If you are looking to contribute, understand the architecture, or view the raw JSON-RPC API payloads, please see:

  • CLAUDE.md - For workspace layout, build instructions, and development guidelines.
  • Design Docs - For architecture, API reference, and resolution algorithm.

About

AI-oriented code indexing service with symbol lookup, call graphs, and cross-references.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages