Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .coveragerc-unit
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[run]
omit =
src/theow/_codegraph/*
src/theow/codegraph.py

[report]
fail_under = 85
15 changes: 15 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -391,6 +391,21 @@ def run_safe_command(cmd: str) -> dict:

This is the key to secure automation. You define the blast radius. The LLM operates within those boundaries.

### CodeGraph

CodeGraph gives the explorer structural awareness of your codebase. Instead of reading entire files to orient, the LLM queries a tree-sitter based graph for symbols, call chains, imports, and class hierarchies.

```python
from theow.codegraph import CodeGraph

graph = CodeGraph(root="./src")
pipeline_agent.tool()(graph.search_code)
```

The LLM gets a single `search_code` tool that supports multiple scopes: find symbols by name, trace callers/callees, list file contents, follow class hierarchies, and find paths between symbols.

CodeGraph is an optional dependency — install with `pip install theow[codegraph]`. See the [CodeGraph README](src/theow/_codegraph/README.md) for full documentation.

## LLM Based Actions

Rules can invoke the LLM directly on match instead of running a deterministic action. Useful for failures that need dynamic investigation rather than a fixed fix.
Expand Down
8 changes: 8 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,11 @@ theow = "theow._cli:app"
daemon = [
# Future: trio, httpx, etc. for server mode
]
codegraph = [
"tree-sitter>=0.23",
"tree-sitter-python>=0.23",
"tree-sitter-go>=0.23",
]

[dependency-groups]
dev = [
Expand All @@ -45,6 +50,9 @@ path = "src/theow/_version.py"
[tool.hatch.build.targets.wheel]
packages = ["src/theow"]

[tool.ty.src]
exclude = ["src/theow/_codegraph/examples/"]

[tool.ruff]
line-length = 100
target-version = "py312"
Expand Down
106 changes: 106 additions & 0 deletions src/theow/_codegraph/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
<div align="center">

# CodeGraph

Tree-sitter based code structure graph for Theow's LLM explorer. Instead of reading entire files to orient (~4000+ tokens), the explorer queries the graph for symbols, call chains, imports, and class hierarchies (~260 tokens).

</div>

<div align="center">
<img src="examples/theow_graph.png" alt="Theow CodeGraph" width="500">
<br>
<sub>Theow's own code graph — generated with <a href="examples/visualize.py">examples/visualize.py</a></sub>
</div>

## Install

CodeGraph is an optional dependency:

```bash
pip install theow[codegraph]
```

## Usage

```python
from theow import Theow
from theow.codegraph import CodeGraph

graph = CodeGraph(root="./src")

engine = Theow(theow_dir=".theow", llm="anthropic/claude-sonnet-4-20250514")
engine.tool()(graph.search_code)
```

The graph builds automatically on first `search_code` call. The LLM gets a single tool that covers all navigation needs.

## `search_code` API

| Parameter | Description |
|-----------|-------------|
| `query` | Symbol name or substring to search for |
| `kind` | Filter by type: `"function"`, `"class"`, `"module"` |
| `scope` | What to search (see below) |
| `file` | Filter to a specific file |
| `line` | Find the symbol at this line number in file |
| `target` | Target symbol for `"path"` scope |

### Scopes

| Scope | Description | Example |
|-------|-------------|---------|
| `symbol` | Find symbols by name (default) | `search_code(query="Rule", kind="class")` |
| `callers` | Who calls this symbol? | `search_code(query="matches", scope="callers")` |
| `callees` | What does this symbol call? | `search_code(query="build", scope="callees")` |
| `references` | All incoming/outgoing relationships | `search_code(query="LLMGateway", scope="references")` |
| `definition` | Where is this symbol defined? | `search_code(scope="definition", file="models.py", line=42)` |
| `file` | List all symbols in a file | `search_code(scope="file", file="_core/_models.py")` |
| `path` | Find relationship path between two symbols | `search_code(query="module.py", scope="path", target="Rule")` |

## Language Support

Visitors extract structure from source files using tree-sitter. Currently supported:

- **Python** — functions, classes, methods, imports, calls, decorators, docstrings
- **Go** — functions, methods with receivers, structs, interfaces, imports, calls, struct embedding

Languages are configured explicitly:

```python
graph = CodeGraph(root="./src", languages=["python", "go"])
```

Defaults to `["python"]` if not specified.

## Configuration

```python
graph = CodeGraph(
root="./src",
languages=["python", "go"], # languages to parse
excludes={"vendor", "testdata"}, # directories to skip
max_file_size=1_000_000, # skip files larger than this (bytes)
)
```

Default excludes: `__pycache__`, `.git`, `.tox`, `.venv`, `venv`, `node_modules`, `dist`, `build`, `.mypy_cache`, `.ruff_cache`, `.pytest_cache`.

## Serialization

```python
# Save to JSON
graph.to_json("graph.json")

# Load from cache
graph = CodeGraph.from_json("graph.json")

# Get JSON string
json_str = graph.to_json()
```

## How It Works

1. **Parse**: Tree-sitter visitors walk source files and extract `Node` (symbols) and `Edge` (relationships) objects
2. **Index**: Nodes are indexed by file path and short name for fast lookup
3. **Resolve**: Symbolic call targets (short names like `helper`) are resolved to fully qualified node IDs, preferring same-file matches
4. **Query**: `search_code` navigates the graph using adjacency lists and BFS — no external graph library needed
5 changes: 5 additions & 0 deletions src/theow/_codegraph/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
"""CodeGraph: tree-sitter based code structure graph for LLM exploration."""

from theow._codegraph._graph import CodeGraph

__all__ = ["CodeGraph"]
Loading
Loading