Stop feeding entire files to your AI. Start querying your codebase.
Based on Recursive Language Models research (MIT/Stanford, 2025)
- About This Project
- The Problem
- The Solution
- How It Works
- Performance
- Quick Start
- Setup for AI Agents
- Commands
- Contributing
- License
- Contact
- Acknowledgments
This is a proof of concept / MVP to validate and make practical use of the ideas presented in the Recursive Language Models research (MIT/Stanford, 2025). The paper makes compelling claims about progressive disclosure and query-based retrieval for code understanding — this tool is my attempt to see if those ideas hold up in real-world usage. Built with Rust for memory safety and blazing fast indexing, it ensures that querying your codebase feels instantaneous and lightweight.
It's also my first public project. I'm making the source code available so you can see for yourself what's going on under the hood — no data collection, no shady business, just the code doing what it says.
Honestly, I don't know yet where this project is headed. I want to see how it's received and how people use it before deciding on the next steps. That's why I'm keeping my licensing options open for now. Maybe it'll become fully open source someday, maybe it'll stay as it is — we'll see.
Feedback is welcome. Pull requests are not, at least for now.
When AI agents work with code, they typically do this:
Agent: "I need to understand this codebase"
→ Reads file1.rs (500 tokens)
→ Reads file2.rs (800 tokens)
→ Reads file3.rs (600 tokens)
→ ...
→ Context window fills up
→ Earlier files get "forgotten" (Context Rot)
→ Agent makes mistakes or asks to re-read files
The result: Thousands of tokens wasted, context rot, slower responses, higher costs.
rlm treats your codebase like a database, not a pile of files.
Agent: "I need to understand this codebase"
→ rlm overview (~200 tokens) — sees project structure and purpose of each file
→ rlm refs Config (~50 tokens) — finds all usages and impact
→ rlm read src/config.rs --symbol load (~100 tokens) — reads only the relevant function
→ Done. Total: ~350 tokens instead of thousands.
The principle: Never load what you don't need. Query, don't dump.
Instead of reading entire files, rlm lets you zoom in progressively:
graph LR
A[overview minimal: ~50 tok] --> B[overview standard: ~200 tok]
B --> C[search]
C --> D[read symbol]
D --> E[read file: Last Resort]
style E stroke:#f96,stroke-width:2px
Most tasks can be completed without ever reading a full file.
Traditional approach:
# Load everything into context, hope for the best
context = read("file1.rs") + read("file2.rs") + read("file3.rs")
llm.generate(context + prompt)rlm approach:
# Query what you need, when you need it
structure = rlm.map() # What files exist and why?
usages = rlm.refs("Config") # Where is Config used?
code = rlm.read("config.rs", symbol="load") # Just this function
llm.generate(structure + usages + code + prompt)The codebase stays outside the context window, queryable on demand.
Traditional AI editing:
Agent: *reads 500-line file*
Agent: *rewrites entire file with one small change*
→ Risk of unintended changes
→ 1000+ tokens for input + output
rlm editing:
rlm replace src/lib.rs --symbol helper --code "fn helper(x: i32) -> i32 { x * 3 }"- AST-based: finds the exact node to replace
- Syntax Guard: validates the change compiles before writing
- Minimal: only the changed code goes through the LLM
graph TD
A[LLM Suggestion] --> B{rlm replace}
B --> C[Locate exact AST Node]
C --> D[Inject New Code]
D --> E{Syntax Guard}
E -- Valid --> F[Write to Disk]
E -- Syntax Error --> G[Abort & Report]
style E stroke:#f96,stroke-width:2px
style F stroke:#28a745,stroke-width:2px
style G stroke:#d73a49,stroke-width:2px,stroke-dasharray: 5 5
rlm's response design produces measurable time and token savings in three separable ways. Each is additive; agents using rlm heavily see the three stack.
Every tool call is a full LLM round: parse the request context, format the tool arguments, execute, parse the response. At 3–8 s per round depending on session size, rounds are the dominant latency cost. rlm packages the follow-up into the first response:
| Task | Manual rounds (Grep + Read + Edit) | rlm rounds | Time saved |
|---|---|---|---|
| Edit a function, verify it compiles | 4 (Grep → Read → Edit → cargo check) |
1 (rlm replace — response includes build: { passed, errors }) |
9–24 s |
| Look up a method's signature to call it | 2–4 (Grep → Read, repeat on wrong match) | 1 (rlm read --metadata) |
3–24 s |
| Find callers of a symbol (unique name) | 1–5 (Grep → Read each match) | 1 (rlm refs) |
3–32 s |
Find callers of a common method (.open(), .new(), .parse()) |
5–15+ (each grep hit needs a Read to identify the receiver type before the list is useful) | 1 (rlm refs Database::open — AST-filtered to the specific symbol) |
15–120 s |
| See a symbol's body + callers + callees + type info | 4+ (Read + Grep + Read + type-lookup) | 1 (rlm context --graph) |
9–32 s |
The ambiguity multiplier matters most on common method names. A
codebase typically has five or more functions called open
(File::open, Database::open, Connection::open, …).
grep "\.open\(" returns them all; the agent then has to Read each
call site to determine the receiver type before the list is useful.
rlm refs starts from the semantic identity and returns only the
callers of that specific open — grep-level noise eliminated by
construction.
A grep-based workflow over-matches (string hits that aren't semantic refs) and under-matches (trait dispatch, re-exports, macro-generated methods). Acting on those results costs a second cycle: edit-based-on-wrong-info → compile error → re-investigate → fix. One rework cycle is typically 3–5 rounds (failure → diagnosis → correction → retry → verify), worth 15–40 s per avoided cycle.
rlm's AST-backed queries return exactly the semantic matches, so edits land first-try. This isn't just a time savings — an agent working from a noise-contaminated grep list will undercount and overcount simultaneously, producing edits that miss some real call sites and break unrelated code.
Some rlm outputs resolve questions that would take 20+ rounds of manual assembly and still produce approximations:
- Call-graphs with correct method-receiver resolution.
- Transitive impact of a symbol change through the ref graph.
- Lexical scope at a specific line (which symbols are visible).
- Which tests transitively exercise a given symbol (new in 0.5.0).
At 3–8 s per round, a 20-round manual attempt is 60–160 s, and the result is often still wrong enough to need a rework cycle. rlm answers in one round with ground truth from the index — orders of magnitude faster, not a small constant factor.
TOON format (automatically configured by rlm setup, see
Setup for AI Agents) shrinks flat responses
(search, refs, files, stats) by 30–50 % versus JSON. Each saved token
reduces both the LLM's input-processing time on subsequent calls and
the prompt-cache pressure over a long session.
A typical coding session with 30–80 rlm calls saves 1–3 minutes of wall-clock latency through class 1 alone, plus 1–2 avoided rework cycles (class 2) worth another 30–80 s, plus any class-3 tasks that otherwise wouldn't have been feasible at all. Numbers vary with session size and task mix — the structural point is that the savings compound across rounds, not just within a single response.
Download pre-built binaries:
| Platform | Download |
|---|---|
| Linux | rlm-linux |
| macOS | rlm-macos |
| Windows | rlm-windows.exe |
Or build from source:
# Requires Rust 1.75+
cargo build --release
# Add to PATH
export PATH="$PWD/target/release:$PATH"cd your-project
rlm index .Note: Indexing respects
.gitignore— files and directories listed there are automatically skipped. Hidden files (starting with.) and common build directories (node_modules/,target/, etc.) are also excluded.
# Get oriented (~200 tokens)
rlm map
# Find where something is used
rlm refs MyStruct
# Read just the function you need
rlm read src/main.rs --symbol main
# Search across the codebase
rlm search "error handling"rlm is designed to be used by AI agents, not manually. There are two ways to integrate it:
The Model Context Protocol (MCP) gives the agent native access to rlm tools.
# Find where rlm is installed
which rlm
# Example output: /home/user/projects/rlm/target/release/rlm
# Register with Claude Code (use absolute path)
claude mcp add rlm -- /home/user/projects/rlm/target/release/rlm mcp
# Or if you just built it:
claude mcp add rlm -- "$(pwd)/target/release/rlm" mcp
# Verify it's registered
claude mcp listNote: Use the absolute path to the
rlmbinary. MCP servers run as separate processes and may not have access to your shell's PATH.
That's it. The agent now has direct access to all rlm commands as native tools.
What the agent sees: 18 MCP tools organized in 4 tiers:
| Tier | Tools | Purpose |
|---|---|---|
| Orient | overview (minimal/standard/tree) |
Project structure at 3 zoom levels |
| Search | search, read (symbol/section + metadata) |
Find and read code |
| Analyze | refs (with impact), context (with callgraph), deps, scope |
Understand code |
| Edit | replace, insert |
Modify code with Syntax Guard |
| Utility | diff, partition, summarize, files, stats, savings, verify, supported, index |
Maintenance |
Note: Both MCP and CLI offer the same 18-tool surface. Key consolidations:
peek/map/tree→overview,type_info/signature→read --metadata,callgraph→context --graph,impact→refs.
If you prefer CLI mode, add instructions to your project's CLAUDE.md:
## rlm Available
This project is indexed with rlm. Use Bash commands for efficient code exploration.
### Quick Reference
- `rlm help` — list all commands
- `rlm help <command>` — detailed help for a command
### Workflow: Start Cheap, Zoom In
1. `rlm overview --detail minimal` — structure only (~50 tokens)
2. `rlm overview` — project overview (~200 tokens)
3. `rlm refs <symbol>` — find usages + impact analysis
4. `rlm read <path> --symbol <n>` — read one function
5. Use Claude Code's Read for full files (last resort)
### Editing
- `rlm replace <path> --symbol <n> --code "<new>" --preview` — preview
- `rlm replace <path> --symbol <n> --code "<new>"` — apply
### Output Format (Minified JSON)
| Key | Meaning |
|-----|---------|
| `r` | results (array) |
| `k` | kind (fn, struct, class, enum, trait, etc.) |
| `n` | name / identifier |
| `l` | lines [start, end] or single line number |
| `c` | content (code) or count |
| `s` | symbol name |
| `f` | file path |
| `t` | token estimate `{"in": N, "out": N}` |
| `q` | quality warning — if `fallback_recommended: true`, use Claude Code's Read for affected lines || Mode | Pros | Cons |
|---|---|---|
| MCP | Native integration, no prompting needed | Requires MCP support in agent |
| CLI | Works with any agent | Agent must be instructed via CLAUDE.md |
For Claude Code, MCP is recommended. For other agents or simpler setups, CLI works well.
Important: Most commands (
overview,search,refs, etc.) only operate on indexed files. If a file wasn't indexed (unsupported extension, excluded by gitignore), it won't appear in results. Userlm filesto see all files regardless of index status.
| Command | Use When |
|---|---|
rlm overview |
Project overview with descriptions (~200 tokens) |
rlm overview --detail minimal |
Quick structure check (~50 tokens) |
rlm overview --detail tree |
Directory hierarchy with symbol annotations |
rlm files |
See ALL files including those with unsupported extensions |
| Command | Use When |
|---|---|
rlm search <query> |
Full-text search. AND by default (foo bar), OR explicit (foo OR bar), "phrase" for contiguous match, prefix* for wildcard |
rlm search <query> --fields minimal |
Same, but hits drop content — names + line ranges only, for existence / file-list queries. Saves ~5k tokens per call |
rlm read <path> --symbol <n> |
Read one function/struct/class |
rlm read <path> --symbol <n> --metadata |
Read with type info + signature |
rlm read <path> --section <heading> |
Read a markdown section |
rlm search vs Claude Code's Grep — both are fast; pick the one
that matches the question:
- Reach for
rlm searchwhen you're hunting for code symbols, documented intent, or content that rlm already indexed (AST-aware, skipsnode_modules/target/.rlmautomatically). Supports AND (default),OR,"phrase", andprefix*. - Reach for
Grepwhen you need regex, literal punctuation, line anchors, or a file rlm doesn't index (yaml, toml, build output, lockfiles, binaries).
| Command | Use When |
|---|---|
rlm refs <symbol> |
Find all usages + impact analysis |
rlm context <symbol> |
Full understanding: body + callers + callees |
rlm context <symbol> --graph |
Include full callgraph |
rlm deps <path> |
See file dependencies |
rlm scope <path> --line N |
What's visible at a location |
| Command | Use When |
|---|---|
rlm replace <path> --symbol <n> --code "<new>" |
Replace a function/struct (inline code) |
rlm replace <path> --symbol <n> --parent <Foo> --code "<new>" |
Disambiguate when two symbols share the ident (e.g. Foo::new vs. Bar::new) |
cat patch.rs | rlm replace <path> --symbol <n> --code-stdin |
Replace, reading code from stdin (no escape headaches) |
rlm replace <path> --symbol <n> --code-file patch.rs |
Replace, reading code from a file |
rlm replace ... --preview |
Preview the change first |
rlm delete <path> --symbol <n> [--parent <Foo>] |
Delete a function/struct — takes the leading doc-comment / attribute block with it |
rlm delete <path> --symbol <n> --keep-docs |
Delete but preserve the doc/attribute sidecar (for replace-via-delete-then-insert workflows) |
rlm extract <src> --symbols A,B,C --to <dest> |
Move symbols to a new or existing file atomically (docs/attrs travel along) |
rlm insert <path> --code "<new>" --position top |
Insert code at a position |
rlm insert <path> --code-stdin --position bottom |
Insert, reading code from stdin |
| Command | Use When |
|---|---|
rlm partition <path> --strategy <semantic|uniform:N|keyword:q> |
Split a file into chunks (semantic symbols, fixed line count, or keyword-anchored) |
rlm summarize <path> |
Condensed summary (symbols + description) |
| Command | Use When |
|---|---|
rlm index . |
Initial indexing or full re-index |
rlm stats |
See index statistics |
rlm stats --savings |
Token savings report (vs Claude Code tools) |
rlm diff <path> |
Compare indexed vs current content |
rlm verify |
Check index integrity |
rlm quality |
Check for parse quality issues |
rlm supported |
List all supported file extensions + parser types |
rlm setup |
Configure Claude Code integration (settings.json + CLAUDE.local.md) |
rlm mcp |
Start the MCP server (stdio transport) |
All output is minified JSON to minimize token consumption:
{"r":[{"id":1,"k":"fn","n":"main","l":[1,5],"c":"fn main() {...}"}],"t":{"in":0,"out":45}}| Key | Meaning |
|---|---|
r |
results |
k |
kind (fn, struct, enum, trait, etc.) |
n |
name |
l |
lines [start, end] |
c |
content |
s |
symbol |
t |
token estimate {"in": N, "out": N} |
f |
file path |
sig |
signature |
dc |
doc comment (///, /**, docstrings) |
at |
attributes/decorators (#[derive], @Override) |
q |
parse quality warning (see below) |
Example with quality warning:
{"r":[...],"t":{"in":100,"out":50},"q":{"fallback_recommended":true,"el":[15,23],"m":"File has 2 parse errors. Consider using read/grep for affected lines."}}rlm uses tree-sitter for AST-based parsing. Tree-sitter grammars may not support the latest language features. When a file contains unsupported syntax, rlm still indexes it but marks the result with a quality warning.
- During indexing, each file's parse result is checked for tree-sitter ERROR nodes
- Files with errors get a
parse_qualityvalue stored in the database - Query responses include a
qfield when quality issues are detected
| Level | Meaning |
|---|---|
complete |
No parse errors, all AST nodes resolved correctly |
partial |
Some ERROR nodes found; most of the file was parsed successfully |
failed |
Majority of the file could not be parsed |
Response received
├── No "q" field → AST data is reliable, use normally
└── "q" field present
├── fallback_recommended: false → Minor issues, AST data mostly reliable
└── fallback_recommended: true
├── For reading code → use Claude Code's Read tool
├── For searching → use `rlm search <query>`
└── For refs/context → results may be incomplete
rlm silently tracks how many tokens each operation saves compared to what Claude Code's native tools (Read/Grep/Glob) would have consumed for the same result.
# See cumulative savings report
rlm stats --savings
# Filter by date
rlm stats --savings --since "2026-03-01"Example output:
{"ops":42,"output":3200,"alternative":48000,"saved":44800,"pct":93.3,"by_cmd":[{"cmd":"overview","ops":12,...}]}The savings MCP tool provides the same report for AI agents.
# See files with parse quality issues in stats
rlm stats
# Inspect detailed quality issues
rlm quality --summary # Summary statistics
rlm quality --unknown-only # Only issues without test coverage
rlm quality --all # All logged issues| Language | Parser | Extensions | Chunks Extracted |
|---|---|---|---|
| Rust | tree-sitter | .rs |
fn, struct, enum, enum_variant, impl, method, mod, trait |
| Go | tree-sitter | .go |
func, type, interface, struct |
| Java | tree-sitter | .java |
class, interface, method, enum |
| C# | tree-sitter | .cs |
class, struct, interface, method, enum |
| Python | tree-sitter | .py, .pyi |
class, def, async def |
| PHP | tree-sitter | .php |
class, function, interface, trait |
| JavaScript | tree-sitter | .js, .jsx |
function, class, arrow, export |
| TypeScript | tree-sitter | .ts |
interface, type, enum, namespace |
| TSX | tree-sitter | .tsx |
JSX components + TS features |
| HTML | tree-sitter | .html, .htm |
element IDs, script/style blocks |
| CSS | tree-sitter | .css |
rules, media queries, keyframes |
| YAML | serde | .yaml, .yml |
top-level keys, nested objects |
| TOML | serde | .toml |
tables, arrays of tables |
| JSON | serde | .json |
semantic keys (scripts, deps) |
| Markdown | structural | .md |
headings as sections |
| pdf-extract | .pdf |
pages as chunks | |
| bash, sql, xml, c, cpp | plaintext | various | FTS-searchable |
rlm works with any agent that can execute shell commands or connect via MCP (stdio transport). See Setup for AI Agents for details.
rlm can complement your IDE's built-in features:
# Use rlm for cross-file analysis that IDEs struggle with
rlm refs Config # What breaks if I change this?
rlm context main --graph # Full call tree across modules# Check for parse quality issues before merge
rlm quality --summary --exit-code
# Generate codebase overview for documentation
rlm overview > docs/architecture.jsonComing soon: Comparative analysis of token consumption and accuracy on real-world tasks.
Preliminary results suggest 60-80% token reduction on typical code exploration tasks, with improved accuracy due to reduced context rot.
rlm is inspired by the Recursive Language Models paper (MIT/Stanford, 2025), which demonstrated that:
- Progressive disclosure beats full-context loading for code understanding
- AST-aware chunking preserves semantic boundaries better than line-based splits
- Query-based retrieval reduces context pollution and improves task accuracy
We've adapted these principles into a practical tool for everyday use with AI coding assistants.
- Core indexing and search
- AST-based code intelligence (refs with impact, context with callgraph)
- Surgical editing with Syntax Guard
- MCP server integration
- Parse quality detection and fallback recommendations
- Configuration file support
- Extended language support (JS, TS, HTML, CSS, YAML, TOML, JSON)
- Test Impact analysis — write responses name covering tests + command (0.5.0)
- Native compiler check post-write (
cargo checksurfaces name-resolution / type errors that Syntax Guard can't see — 0.5.0) -
rlm extract— atomic module-split primitive (0.5.0) -
--parentdisambiguation for same-ident symbols (0.5.0) - TOON output default for agent-scoped projects (0.5.0)
- Native compiler check for Go / TypeScript / Python
- Automatic import-inference on
rlm extract - Benchmark suite with published results
- Language Server Protocol (LSP) integration
- Web UI for visualization
- More languages (C++, Ruby, Kotlin)
rlm is part of a larger vision: making powerful tools accessible to everyone, not just experts.
The same principle applies to AI coding assistants: they shouldn't require you to understand tokenization, context windows, or prompt engineering. They should just work efficiently.
"The best interface is no interface." — Golden Krishna
This project is currently a solo-run exploration. While I am grateful for the interest, I am not accepting Pull Requests at this time.
- Have a bug or idea? Please open an Issue. I value your feedback!
- Want to contribute code? Not yet, but feel free to "Star" the project to show your support.
This approach helps me maintain the architectural integrity while the project is in its early MVP stage.
This project uses a Source Available License.
You are free to use the software (including binaries) and inspect the source code. However, modification, redistribution, and creating derivative works are not permitted.
See LICENSE for the full terms.
- Recursive Language Models research (MIT/Stanford)
- tree-sitter for robust parsing
- The Rust community for excellent tooling