A standalone, AI-oriented code indexing framework that speeds up code discovery for AI agents (like Claude or Gemini). Think of it as the indexing engine behind an IDE like IntelliJ or Eclipse, but exposed as a standalone service.
- Supported Languages: Java, Kotlin, Rust, Go, C, C++
- Interfaces: MCP (stdio) for AI agents, JSON-RPC (HTTP) for custom scripts
- Storage: Fast, local SQLite database
Instead of forcing AI agents to grep blindly through hundreds of files, Code Index parses your source code and builds a relational graph of your codebase.
An AI agent can instantly ask precise questions like:
- "What methods does the
UserControllerclass have?" - "Find every function that calls
processPayment()." - "What classes implement the
PaymentGatewayinterface?" - "Show me the definition of the
Userstruct."
It supports full cross-file resolution, tracks type hierarchies, and even updates incrementally in real-time as you modify files. Mixed-language projects (e.g., Chromium with C++ and Java, or Android apps with Java and Kotlin) are indexed into a single unified graph.
- Blazing Fast Symbol Lookup: Instantly jump to classes, functions, methods, traits, and macros.
- Call Graphs & Type Hierarchies: Navigate up and down the call stack and inheritance trees.
- Cross-References: Find all reads, writes, and usages of a specific symbol.
- Mixed-Language Projects: All supported languages are indexed into one database — query across language boundaries in a single project.
- Live File Watching: The index stays up-to-date automatically as you type and save files.
- Full Text & Semantic Search: Search by exact keywords (FTS5) or use AI embeddings to search by natural language meaning.
- Build-System Aware: Automatically detects source roots for Gradle, Maven, and Cargo projects, ignoring generated folders like
build/ortarget/.
You need the Rust toolchain installed.
# Clone the repository
git clone https://github.com/your-username/code-index.git
cd code-index
# Build the release binary with semantic search enabled
cargo build --release -p code-index-server --features semantic
# The binary will be available at target/release/code-index-server(Note: The first build may take a moment as it compiles the Tree-sitter C parsers and bundled SQLite).
The easiest way to use Code Index is by connecting it to an AI CLI tool using the Model Context Protocol (MCP). The configuration is exactly the same for both Gemini CLI and Claude CLI.
Create or update the appropriate settings file for your agent:
- Gemini CLI:
.gemini/settings.json(in your project root) - Claude CLI:
.mcp.json(in your project root)
Add the following configuration:
{
"mcpServers": {
"code-index": {
"command": "/absolute/path/to/code-index/target/release/code-index-server",
"args": ["--transport", "stdio", "--root", "."]
}
}
}The CLI will automatically detect the server and provide the indexing and search tools.
By default, Code Index uses exact-match and structural search. If you want the ability to search your codebase using natural language (e.g., "How is the user profile updated?"), you can enable Semantic Search.
Semantic Search requires an embedding provider. The server is compatible with both paid OpenAI models and free, locally-hosted models (like Ollama).
- Install Ollama on Linux or macOS:
curl -fsSL https://ollama.com/install.sh | sh - Download the local embedding model (
nomic-embed-textis a small ~274MB model perfect for code):ollama pull nomic-embed-text
- Start the
code-index-serverwith the following environment variables (you can put these in your.bashrcor export them before running your AI CLI):export CODE_INDEX_EMBEDDING_API_URL="http://localhost:11434/v1/embeddings" export CODE_INDEX_EMBEDDING_MODEL="nomic-embed-text" export CODE_INDEX_EMBEDDING_API_KEY="dummy" # Required by the client, but ignored by Ollama export CODE_INDEX_EMBEDDING_DIMENSIONS="768"
Export your API key. The server will default to text-embedding-3-small.
export CODE_INDEX_EMBEDDING_API_KEY="sk-your-openai-key"Note: If no API key is provided, the server will gracefully disable semantic search and continue providing fast structural and text searches.
You can run the server directly and query it using HTTP/JSON-RPC (defaults to port 9120):
cargo run -p code-index-server -- --root /path/to/your/projectFor Android projects with build variants (flavors), use --brand and --source-rule to control which source sets get indexed:
{
"mcpServers": {
"code-index": {
"command": "/absolute/path/to/code-index/target/release/code-index-server",
"args": [
"--transport", "stdio",
"--root", ".",
"--brand", "exampleBrand",
"--source-rule", "common:src/main/java",
"--source-rule", "brand:src/{brand}/java"
]
}
}
}To switch brands at runtime without restarting, agents can use the set_brand MCP tool.
For projects with more than 10,000 source files, the default per-file indexing may be slow. Use --bulk to switch to a faster bulk indexing pipeline that resolves imports, calls, and type relations via SQL batch operations:
code-index index --root /path/to/project --bulkOr in .mcp.json:
{
"mcpServers": {
"code-index": {
"command": "code-index-server",
"args": ["--transport", "stdio", "--root", ".", "--bulk"]
}
}
}We run ablation benchmarks to measure the impact of Code Index on AI agent performance. Each benchmark task is executed twice — once with MCP tools enabled, once without — to compare token usage, cost, and speed.
Methodology: Average of 3 runs, Claude CLI with Opus 4.6 (1M context).
| Task type | With MCP | Without MCP | Winner |
|---|---|---|---|
| File symbol listing | 18.4s, $0.13 | 69.4s, $0.30 | MCP (3.8x faster, 2.3x cheaper) |
| Call graph traversal | 44.5s, $0.24 | 79.2s, $0.30 | MCP (1.8x faster, 1.3x cheaper) |
| Cross-module tracing | 39.3s, $0.19 | 66.8s, $0.24 | MCP (1.7x faster, 1.2x cheaper) |
| Find implementations | 22.5s, $0.14 | 29.4s, $0.16 | MCP (1.3x faster, 1.2x cheaper) |
| Find definition | 20.3s, $0.23 | 16.8s, $0.21 | Grep (faster and cheaper for simple lookup) |
Quality: 100% both modes across all tasks. Overall: MCP 44% faster and 23% cheaper ($2.78 vs $3.63).
| Task type | With MCP | Without MCP | Winner |
|---|---|---|---|
| C++ cross-module trace | 314s, $0.50, 93% q | 334s, $1.39, 100% q | MCP (2.8x cheaper, similar speed) |
| C++ call graph | 253s, $0.51, 93% q | 344s, $0.61, 93% q | MCP (1.4x faster, 1.2x cheaper) |
| C++ find definition | 66s, $0.31, 100% q | 84s, $0.39, 100% q | MCP (1.3x faster, 1.3x cheaper) |
| C++ file symbols | 268s, $0.89, 80% q | 333s, $1.03, 100% q | MCP faster/cheaper, grep better quality |
| C++ interface impls | 232s, $0.45, 93% q | 162s, $0.42, 60% q | MCP better quality, grep faster |
| Java interface impls | 50s, $0.16, 100% q | 72s, $0.23, 100% q | MCP (1.4x faster, 1.5x cheaper) |
| Java call graph | 106s, $0.30, 83% q | 111s, $0.35, 100% q | MCP cheaper, grep better quality |
| Java find definition | 49s, $0.15, 100% q | 33s, $0.12, 100% q | Grep (faster and cheaper) |
Overall: MCP 28% cheaper ($9.81 vs $13.67) and 9% faster (4014s vs 4421s). Quality is similar (MCP 93% vs grep 94%) — MCP occasionally misses facts on complex exploration, while grep occasionally misses implementations that require structural traversal to find.
| Metric | Android (860 files) | Chromium (86K files) |
|---|---|---|
| MCP speed advantage | 44% faster | 9% faster |
| MCP cost advantage | 23% cheaper | 28% cheaper |
| Quality | identical (100%) | similar (93% vs 94%) |
| MCP cost-effective wins | 4/5 tasks | 5/8 tasks |
Takeaway: Code Index consistently reduces cost across both project sizes. The speed advantage is largest on the small project (44%) where grep has to search many files per query. On the large project, the cost savings are more significant (28%) because grep operations on 86K files are expensive. Quality is comparable — MCP sometimes explores too aggressively (missing facts in 26-turn call graph sessions), while grep sometimes misses implementations that require structural traversal.
Best for:
- Cost reduction on large codebases — grep on 86K files is expensive; structured index queries are cheap per call
- Multi-hop structural queries — call graphs, cross-module tracing, type hierarchies
- Completeness-sensitive tasks — finding all implementations of a widely-inherited class
- Disambiguation — querying a specific
connectmethod out of thousands of matches
Not needed for:
- Simple "where is X defined?" lookups — grep is faster for single-hop queries
- Tasks where answer quality is critical and exploration depth is unpredictable
Results and scripts are in tests/benchmark/.
./tests/benchmark/bench.sh --prompts prompts_example.txt --runs 3See prompts_example.txt for the prompt format.
If you are looking to contribute, understand the architecture, or view the raw JSON-RPC API payloads, please see:
- CLAUDE.md - For workspace layout, build instructions, and development guidelines.
- Design Docs - For architecture, API reference, and resolution algorithm.