Rust-native codebase dependency analysis. A single binary that scans your repo with tree-sitter AST parsers, builds a file-level import graph and a function-level call graph, and exposes 54 analysis actions — PageRank, HITS, articulation points, community detection, backward slicing, taint analysis, cross-language bridges, and more — through a flat CLI.
No servers. No databases. No API keys. One static binary, .codemap/cache.bincode next to your repo for incremental scans, and a /codemap Claude Code skill that wraps the same binary.
Version: 5.0.0 | Workspace: codemap-core (library) + codemap-cli (binary) + codemap-napi (Node.js bindings) | License: MIT
- Why codemap?
- Installation
- Usage
- Actions
- Examples
- Output formats
- Supported languages
- Architecture
- Performance
- Configuration
- Troubleshooting
- Contributing
- License
Most code-analysis tools are either language-specific (works great for one stack, useless for the rest of your repo), GUI-bound (point and click through a web app), or Python-based (slow on anything larger than a toy). codemap is the opposite: one Rust binary, one --dir <path> <action> invocation, multi-language, cache-accelerated, parallel.
What makes it work:
-
Tree-sitter AST for every supported language. Imports, exports, function definitions, call sites, and data-flow nodes are all extracted from real parse trees. Not regex. Not heuristics. The regex path is a fallback only for YAML/CMake and for files tree-sitter fails to parse.
-
54 actions, one dispatch. Every analysis is a single CLI verb.
codemap --dir src pagerankranks files.codemap --dir src taint req.body db.querytraces taint.codemap --dir src risk HEAD~3scores a PR. No sub-commands, no flags trees to memorize. -
Cross-language bridge detection. PyO3, pybind11, TORCH_LIBRARY, Triton, CUDA kernels, monkey-patches, and YAML native-function dispatch tables are all first-class edges. A Python function calling into a C++ op registered via TORCH_LIBRARY shows up in the call graph. Most tools quietly drop these edges.
-
Incremental cache. First scan parses every file in parallel with rayon. Subsequent scans only re-parse files whose mtime changed, via a bincode-serialized cache at
.codemap/cache.bincode. Re-running analyses across a warm cache is effectively free. -
Function-level call graph. Not just "file A imports file B" —
codemap call-graphresolves call sites to exported function targets across the imported files. Powersdead-functions,complexity,clones,diff-impact, andentry-points. -
Code property graph (CPG) for data flow.
data-flow,taint,slice,trace-value, andsinksrun on a lazily-built CPG of def/use edges. Backward slicing finds everything that contributes to a value; forward tracing finds everything a value reaches. -
Multi-repo scans. Repeat
--dirto scan several repos in one pass. Imports that cross the directory boundary become real edges in the merged graph. -
Used by other tools. The auditor plugin integrates codemap as of v2.2.0 to feed its reviewers real structural data instead of whatever they find by grep.
git clone https://github.com/charleschenai/codemap.git \
~/.claude/plugins/marketplaces/codemap \
&& bash ~/.claude/plugins/marketplaces/codemap/install.shThe installer clones the repo, verifies the plugin structure, merges entries into ~/.claude/settings.json (non-destructive, requires python3), then builds the codemap binary with cargo build --release and drops it into ~/bin/codemap (or /usr/local/bin/codemap if running as root). Requires a Rust toolchain.
git clone https://github.com/charleschenai/codemap.git
cd codemap
cargo build --release -p codemap-cli
ln -sf "$(pwd)/target/release/codemap" ~/bin/codemapbash install.sh --check # verifies plugin files, settings.json, binary on PATH
codemap --helpbash install.sh --uninstallRemoves the plugin directory, cache, and ~/bin/codemap. /usr/local/bin/codemap is left alone (needs sudo).
codemap [OPTIONS] <ACTION> [TARGET...]
| Flag | Purpose |
|---|---|
--dir <PATH> |
Directory to scan. Repeat for multi-repo scans. Defaults to current working directory. |
--include-path <PATH> |
Extra C/C++ include search path. Repeatable. Used during import resolution. |
--json |
Emit a JSON envelope ({action, target, files, result}) instead of human text. |
--tree |
For data-flow actions, render results as an ASCII tree instead of a flat list. |
--no-cache |
Force a fresh scan. Ignores .codemap/cache.bincode. |
--watch [SECS] |
Re-run the action every N seconds (default 2). Clears the screen each tick. |
-q, --quiet |
Suppress scan/cache status messages on stderr. |
codemap --dir /path/to/src stats
codemap --dir src call-graph some_function
codemap --dir src taint req.body db.query
codemap --dir src --include-path ./third_party/include trace src/main.cpp
codemap --dir ./service --dir ./shared hotspots # multi-repoTarget arguments are joined with spaces, so codemap why a.rs b.rs and codemap why a.rs -> b.rs both work (the -> separator is stripped).
All 54 actions grouped by category. Every action runs against the full graph unless it takes a target. Targets are files, function names, git refs, or patterns depending on the action.
| Action | What it does |
|---|---|
stats |
File count, line count, import edges, external URLs, exports. Extension breakdown. |
trace <file> |
Imports, importers, URLs, and exports for one file. |
blast-radius <file> |
BFS over imported_by — every file transitively depending on the target. |
phone-home |
Files containing external URLs, grouped and sorted by URL count. |
coupling <substring> |
Files that import anything containing the substring. |
dead-files |
Files with zero importers, excluding common entry-point basenames (index, main, cli, app, server, entry). |
circular |
DFS-based cycle detection with canonical rotation dedup. Top 20. |
exports <file> / functions <file> |
List exported symbols for one file (same action, two names). |
callers <symbol> |
Word-boundary regex search across all scanned files, filtered to exclude export/definition lines. Caps at 5000 hits. |
hotspots |
Top 30 most-coupled files (imports + imported_by). |
size |
Top 30 largest files by line count, with percentage of codebase. |
layers |
BFS depth from roots. Labels entry points / orchestration / services / utilities / leaf modules. Flags cross-layer violations (deeper importing shallower, skipping layers). |
diff <git-ref> |
git diff --name-only <ref> intersected with scanned files + combined blast radius. |
orphan-exports |
Exported symbols never referenced in files that import them. Cached per-file reads. |
| Action | What it does |
|---|---|
health |
0-100 score across 4 dimensions (cycles, coupling, dead files, complexity), each 0-25. Letter grade A–F. Emits recommendations below 80. |
summary |
One-screen dashboard — file/line/fn/export counts, language mix, cycle count, top 5 coupled files, top 5 most-complex functions. Box-drawn. |
decorators <pattern> |
Find Python/TS @decorator and Rust #[attribute] usages matching the (case-insensitive) pattern, resolved to the symbol they annotate. |
rename <old> <new> |
Preview a word-boundary rename across all scanned files. Unified diff output. No files are modified. |
context [budget] |
PageRank-ranked, token-budgeted repo map (file, line count, short imports, function signatures). Budget accepts raw numbers or Nk suffix. Default 8000. |
| Action | What it does |
|---|---|
why <A> <B> |
BFS shortest path A→B via imports. Falls back to reverse edges (imported_by) if no forward path. |
paths <A> <B> |
DFS all paths A→B, depth ≤ 10, cap 20 paths. If none forward, tries B→A. |
subgraph <pattern> |
BFS (both directions) from every file matching the substring — full connected component around a keyword. |
similar <file> |
Top 20 files ranked by Jaccard similarity over local imports + importers. |
structure [pattern] |
File tree with per-function outlines (line, name, params, [pub] marker). |
| Action | What it does |
|---|---|
pagerank |
20 iterations, damping 0.85, with dangling-node redistribution. Top 30, scores × 1000. |
hubs |
HITS — 20 iterations, Jacobi update, L2 normalize. Top 20 hubs (orchestrators) + top 20 authorities (core). |
bridges |
Iterative Tarjan articulation-point detection on the undirected projection. Ranked by connections. |
clusters |
Label propagation, seeded LCG PRNG, Fisher-Yates shuffle, 15 iterations. Groups ≥ 2 members with internal-coupling %. |
islands |
BFS connected components, sorted by size. |
dot [target] |
Graphviz DOT. Full graph, or 2-hop BFS neighborhood when a target is given. |
mermaid [target] |
Mermaid graph LR, suitable for pasting into GitHub docs. 2-hop BFS when targeted. |
| Action | What it does |
|---|---|
call-graph [file] |
Cross-file function calls resolved via an export map. Top 50 grouped by source function. |
dead-functions |
Exported functions with no callers outside their own file. Top 100. |
fn-info <file> |
Per-function listing for one file — start/end line, exported marker, outgoing calls. |
diff-functions <git-ref> |
Added / removed / modified functions between working tree and <ref> via regex over git show <ref>:<file>. Covers JS/TS, Rust, Python, Go, Ruby, Java/PHP signatures. |
complexity [file] |
Cyclomatic complexity + max brace nesting depth per function. Top 30 or full listing for a target file. Flags [moderate] (>5) and [HIGH] (>10). |
import-cost <file> |
Transitive import weight — total files and lines pulled in, plus heaviest 15 dependencies. |
churn <git-ref> |
Files changed since <ref>..HEAD × coupling = churn risk score. Top 30. |
api-diff <git-ref> |
Added / removed exports vs <ref>. JS/TS export-declaration regex. |
clones |
Structural clone groups — functions fingerprinted by (line_count, call_count, param_count, is_exported). Skips < 3-line functions. |
git-coupling [N] |
Co-change analysis over last N commits (default 200). Flags pairs as import (expected) or HIDDEN (co-change without an import link — the dangerous kind). |
risk <git-ref> |
Composite PR risk score 0-100 across blast radius (30), coupling (30), complexity (20), scope (20). Levels: LOW / MEDIUM / HIGH / CRITICAL. |
diff-impact <git-ref> |
diff + function-level changes + per-file blast radius with source attribution. |
entry-points |
Detects main / test / route entries — main patterns (main, cli, run, serve, …), test file heuristics, Flask/FastAPI/Django-style @route/@app.*/@router.* decorators. |
Backed by the CPG (code property graph). Built lazily on first data-flow action and kept in-memory for the process lifetime.
| Action | What it does |
|---|---|
data-flow <file> [fn] |
Def/use edges per function. Params → uses, local defs → uses, return lines. |
taint <source> <sink> |
Forward trace from source nodes ∩ backward slice from sink nodes. If no path, falls back to the backward slice alone. Source/sink patterns configurable via .codemap/dataflow.json. |
slice <file>:<line> |
Backward slice — every CPG node that contributes to the target. Up to 20 hops. |
trace-value <file>:<line>:<name> |
Forward reachability from a def. Marks reached nodes that match sink patterns with SINK. |
sinks [file] |
All sink nodes grouped by category (filesystem, database, xss, etc.). Categories come from defaults + .codemap/dataflow.json overrides. |
Pass --tree to taint / slice / trace-value for ASCII-tree rendering instead of a flat list.
| Action | What it does |
|---|---|
lang-bridges [file] |
Every bridge edge detected — torch_library, torch_ops, pybind11, pyo3_class, pyo3_function, pyo3_methods, triton_kernel, triton_launch, cuda_kernel, cuda_launch, monkey_patch, autograd_func, yaml_dispatch, build_dep, dispatch_key, trait_impl. |
gpu-functions |
Bridges tagged as GPU kernels — Triton JIT and CUDA __global__. |
monkey-patches |
Python module.Class = Replacement reassignments detected across files. |
dispatch-map |
Op name → per-device implementations (TORCH_LIBRARY m.impl + YAML native_functions.yaml). |
| Action | What it does |
|---|---|
compare <other-dir> |
Re-scans <other-dir> as a second graph and diffs the two — file add/remove, line delta, coupling changes per common file, new / removed external URLs. |
codemap --dir src health=== Project Health: 82/100 (B) ===
Circular deps [████████████████████] 25/25 0 cycles
Coupling [████████████████░░░░] 20/25 hottest file touches 42% of codebase
Dead code [████████████████░░░░] 20/25 8% dead files
Complexity [█████████████░░░░░░░] 17/25 12% high-complexity fns
Files: 143 Functions: 891 Exports: 312
codemap --dir src blast-radius src/parser.rsEvery file that transitively depends on parser.rs.
codemap --dir src pagerank
codemap --dir src hubsPageRank gives importance; hubs separates orchestrators (depend on many) from authorities (depended on by many).
codemap --dir src taint req.body db.query
codemap --dir src taint req.body db.query --treeCPG forward-from-source ∩ backward-from-sink. The tree form is usually what you want when showing this to someone.
codemap --dir src risk HEAD~1
codemap --dir src diff-impact mainrisk returns a composite score and severity. diff-impact adds function-level change detail and per-file blast-radius attribution.
codemap --dir src context 8kPageRank-ranked file + function signatures, fitted to an 8000-token budget. Drop into a prompt for orientation.
codemap --dir src dot > graph.dot && dot -Tsvg graph.dot -o graph.svg
codemap --dir src mermaid parser > graph.mmd # 2-hop around "parser"Find hidden dependencies via git history
codemap --dir src git-coupling 500Files that co-change but have no import link — the most dangerous class of coupling, because nothing in the code tells you the two files are related.
codemap --dir src --watch 5 health
codemap --dir src --watch complexity src/parser.rscodemap --dir ./service --dir ./shared-libs hotspotsImports that cross the boundary become real edges in the merged graph.
Category headers, fixed-width columns, Unicode box drawing where it helps. Stable enough to grep.
codemap --dir src pagerank --json{
"action": "pagerank",
"target": null,
"files": 143,
"result": "=== PageRank (top 30 most important files) ===\n\n 123.45 rank src/parser.rs\n ..."
}The envelope is stable (action, target, files, result). The result field currently holds the human-readable rendering as a string — this is a simple wrapper for scripting, not a structured data export. For real data extraction, parse the human output (it's designed to be grep-stable) or use dot / mermaid for graph formats.
dot and mermaid emit pure graph text. Pipe to Graphviz or paste into a GitHub markdown block.
Pass --tree to taint, slice, or trace-value for ASCII-tree rendering of the CPG backward or forward walk.
Tree-sitter grammars are in use for every language marked AST. YAML/CMake are scanned with regex for URLs and bridge-detection patterns (they have no functions or imports in the same sense).
| Language | Extensions | Parser | Notes |
|---|---|---|---|
| TypeScript | .ts |
tree-sitter-typescript | |
| TSX | .tsx |
tree-sitter-typescript (TSX grammar) | |
| JavaScript | .js, .jsx, .mjs, .cjs |
tree-sitter-javascript | |
| Python | .py |
tree-sitter-python | |
| Rust | .rs |
tree-sitter-rust | |
| Go | .go |
tree-sitter-go | |
| Java | .java |
tree-sitter-java | |
| Ruby | .rb |
tree-sitter-ruby | |
| PHP | .php |
tree-sitter-php | |
| C | .c, .h |
tree-sitter-c | |
| C++ | .cpp, .cc, .cxx, .hpp, .hxx |
tree-sitter-cpp | |
| CUDA | .cu, .cuh |
tree-sitter-cpp | Parsed as a C++ superset. |
| YAML | .yaml, .yml |
regex | URLs + YAML dispatch tables (e.g. native_functions.yaml). |
| CMake | .cmake |
regex | URLs + build-dep detection. |
Directories skipped during walk: node_modules, .git, dist, build, .codemap, target. Symlinks are not followed. Files larger than 10 MB are skipped. Recursion depth capped at 50.
codemap/
├── Cargo.toml # Workspace root
├── codemap-cli/ # CLI binary
│ └── src/main.rs # Clap parsing, dispatch, --watch, --json
├── codemap-core/ # Library (all analysis lives here)
│ └── src/
│ ├── lib.rs # scan() + execute() public API
│ ├── scanner.rs # Walk → cache → parallel parse → resolve → bridge edges
│ ├── parser.rs # ext_to_grammar + AST extractors + regex fallback
│ ├── resolve.rs # Import specifier → file path resolution
│ ├── types.rs # Graph, GraphNode, FunctionInfo, Bridge*, DataFlowConfig
│ ├── cpg.rs # Code property graph — def/use edges, forward/backward, tree render
│ ├── utils.rs # format_number, truncate, pad_end
│ └── actions/
│ ├── mod.rs # dispatch(action, target) → String
│ ├── analysis.rs # 14 file-level actions + health
│ ├── insights.rs # summary, decorators, rename, context
│ ├── navigation.rs # why, paths, subgraph, similar, structure
│ ├── graph_theory.rs # pagerank, hubs, bridges, clusters, islands, dot, mermaid
│ ├── functions.rs # 13 function-level actions
│ ├── dataflow.rs # data-flow, taint, slice, trace-value, sinks
│ ├── bridges.rs # lang-bridges, gpu-functions, monkey-patches, dispatch-map
│ └── compare.rs # compare two repos
├── codemap-napi/ # Node.js bindings (same core, napi-rs wrapper)
├── plugin/skills/codemap/SKILL.md # The /codemap Claude Code skill
├── .claude-plugin/marketplace.json
└── install.sh
- Walk.
walk_dirrecursively enumerates supported extensions, skipping common build/junk dirs and symlinks. Depth cap 50. - Cache lookup. Load
.codemap/cache.bincode(bincode-encodedCacheData). Entries with matching mtime (±1 ms) reuse cached imports, exports, functions, data-flow, bridges. Cache version stamp forces invalidation on schema changes (currentlyCACHE_VERSION = 8). - Parallel parse. Misses are parsed with rayon across all cores.
parse_filedispatches on extension → tree-sitter grammar → AST extractors (extract_imports_from_ast,extract_exports_from_ast,extract_functions_from_ast,extract_data_flow_from_ast). Parser instances are thread-local and cached per grammar to amortize setup. - Import resolution. Each import specifier is resolved against the scan directory,
--include-pathlist, and sibling files viaresolve::resolve_import. Unresolved specifiers stay as strings (still visible viatraceandphone-home). - Reverse edges.
imported_byis populated after all nodes are built. - Bridge resolution. Cross-language bridge registrations (TORCH_LIBRARY ops, pybind11 defs, PyO3
#[pymodule], YAML dispatch rows, Triton/CUDA kernel launches, monkey-patches) are matched across files and added as extra import edges so graph-theory actions see them too. - Save cache. Atomic write via
.bincode.tmp→ rename.
Built lazily the first time a data-flow action runs (cpg::ensure_cpg). Nodes are typed (Def, Use, Call, Return, Param, …) with file/line/name/expr. Edges are def→use chains plus param→use. Backward slicing and forward tracing are BFS on those edges, capped at 20 hops by default. build_tree + render_tree give the --tree output.
Source/sink/sanitizer patterns for taint and sinks come from a sensible default list in types.rs (process.env, req.body, exec, eval, fs.writeFile, db.query, …) and can be extended per-repo via .codemap/dataflow.json:
{
"sinks": [{ "pattern": "myLogger.send", "category": "logging" }],
"sources": [{ "pattern": "request.form", "category": "user-input" }],
"sanitizers": [{ "pattern": "sanitize_html" }]
}Patterns support trailing wildcards (foo.*) and last-segment matching.
When --dir is repeated, each directory is scanned independently (its own cache), then merged. Cross-repo imports are re-resolved against the merged node set so a file in one repo importing a file in another becomes a real edge.
Measurable from the code and EVOLUTION.log:
- Parallel parse.
rayon::par_iterover cache misses. Parse throughput scales with cores. - Incremental cache. Warm runs reparse only modified files. On codemap's own source (~8k lines of Rust) a warm
statsruns in well under a second. - Thread-local parser pool.
tree_sitter::Parserinstances are created once per grammar per thread and reused (PARSER_CACHEthread-local). - Regex hoisting. All analysis regexes are compiled once per action, not per file or per line.
- Zero clippy warnings at v5.0.0. 31 integration tests (self-referential — codemap scans its own
codemap-core/src/). - Cache sanity. Load caps bincode file at 256 MB and rejects entries with path traversal (
.., leading/). Files > 10 MB are skipped at parse time.
If you want real numbers on your codebase, codemap --dir <path> stats prints file and line counts; time it with time for baseline throughput.
codemap is stateless except for two per-repo files, both under .codemap/ in the scanned directory:
| File | Purpose | Required? |
|---|---|---|
.codemap/cache.bincode |
Bincode-encoded scan cache. Auto-managed. Delete or pass --no-cache to force a fresh scan. |
No. |
.codemap/dataflow.json |
Per-repo sink / source / sanitizer patterns layered on top of defaults. Shape shown in the CPG section above. | No — defaults cover the common web framework / ORM patterns. |
Add .codemap/ to .gitignore.
| Symptom | Cause | Fix |
|---|---|---|
codemap: command not found |
Binary not on PATH | Add ~/bin (or /usr/local/bin) to PATH, or pass bash install.sh again with a Rust toolchain installed. |
Unknown action: foo |
Typo or old version | codemap --help prints every action. |
File not found: src/main.rs |
Target path is not relative to the scan dir | Paths are resolved relative to --dir. If --dir is /home/me/proj, use src/main.rs, not the full path. find_node also accepts the basename if it's unique. |
| Results feel stale | Cache didn't invalidate | --no-cache, or rm -rf .codemap/. Cache invalidation is mtime-based — if your build tool preserves mtimes, it can fool the cache. |
lang-bridges shows nothing in a PyTorch repo |
You only scanned Python or only scanned C++ | Scan both with multiple --dir, or one directory that contains both. Bridge resolution matches registrations to call sites across the merged graph. |
taint: "Source not found" / "Sink not found" |
Pattern doesn't match any call site in the CPG | Check codemap --dir src sinks to see what sinks were actually detected. Extend .codemap/dataflow.json. |
| C/C++ imports resolve to nothing | Missing include paths | --include-path /path/to/include, repeat for each. |
| Slow first scan, large repo | First pass parses everything | Subsequent scans use the cache. If you expect to run many actions in a row, the first is the cost; the rest are cheap. |
git diff / churn / risk fail |
Not a git repo, or ref doesn't exist | codemap shells out to git in the scan dir. Make sure the ref resolves: git rev-parse HEAD~3. |
| PHP files analyzed as regex-only (pre-v4.2) | Old install | v4.2+ uses tree-sitter-php. Update. |
/codemap not appearing in Claude Code |
Plugin not enabled | bash install.sh --check — verifies plugin files, settings.json entries, binary on PATH. |
The core library (codemap-core/) is where all the work is. Adding an action is:
- Implement
pub fn my_action(graph: &Graph, target: &str) -> Stringin the appropriate file undercodemap-core/src/actions/. - Wire it into the
matchincodemap-core/src/actions/mod.rs::dispatch. - Add a line to the
after_helpblock incodemap-cli/src/main.rssocodemap --helpadvertises it. - Add an integration test in
codemap-core/tests/integration.rs(the suite scans codemap's ownsrc/— self-referential testing keeps the feedback loop fast). - Add an entry to
EVOLUTION.log.
For new languages: add the tree-sitter crate to codemap-core/Cargo.toml, extend SUPPORTED_EXTS in scanner.rs, wire ext_to_grammar + grammar_to_language in parser.rs, and teach the AST extractors the language's node types.
MIT.