Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
134 changes: 134 additions & 0 deletions API.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,134 @@
# API Reference: Structural Memory Protocol (SMP)

SMP exposes a **JSON-RPC 2.0** API. All requests must be sent as POST requests to `/rpc` with `Content-Type: application/json`.

## 📡 General Request Format
```json
{
"jsonrpc": "2.0",
"method": "smp/method_name",
"params": { ... },
"id": 1
}
```

---

## 🔍 Discovery & Search
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section is missing documentation for several implemented methods found in smp/protocol/handlers/query.py, specifically smp/impact (handled by ImpactHandler) and smp/trace (handled by TraceHandler). These should be included to provide a complete API reference.


### `smp/locate`
Finds relevant code entities using Community-Routed Graph RAG.
- **Params:**
- `query` (string): The natural language description of what to find.
- `seed_k` (int, optional): Number of initial vector seeds. Default: 3.
- `hops` (int, optional): Depth of graph traversal. Default: 2.
- `top_k` (int, optional): Number of final results. Default: 10.
- **Returns:** `LocateResponse` containing ranked results and a structural map of relationships.
Comment on lines +23 to +26
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There are several discrepancies between the documentation and the implementation of smp/locate:

  1. The parameters seed_k and hops are listed as optional, but the SeedWalkEngine.locate method (smp/engine/seed_walk.py:347) does not accept them; they are hardcoded to defaults inside the method.
  2. The return type is documented as LocateResponse, but the LocateHandler (smp/protocol/handlers/query.py:108) wraps the result in a {"matches": ...} object.
  3. SeedWalkEngine.locate returns a list containing the response dictionary, which results in a nested structure: {"matches": [{...}]}.


### `smp/search`
BM25-ranked full-text search across enriched metadata.
- **Params:**
- `query` (string): Keywords to search.
- `match` (string): `"all"` (AND) or `"any"` (OR).
- `filter` (object, optional):
- `node_types` (list): e.g., `["Function", "Class"]`.
- `tags` (list): e.g., `["billing"]`.
- `scope` (string): e.g., `"package:src/payments"`.
- `top_k` (int): Number of results.
- **Returns:** List of matches ranked by BM25 score.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The return value for smp/search is documented as a "List of matches", but the implementation in DefaultQueryEngine.search (smp/engine/query.py:448) returns a dictionary: {"matches": results, "total": len(results)}. The documentation should be updated to reflect this structure.


---

## 🛠 Enrichment & Annotation

### `smp/enrich`
Extracts static metadata (docstrings, decorators) from a specific node.
- **Params:**
- `node_id` (string): ID of the node to enrich.
- `force` (bool, optional): Re-enrich even if source hash is unchanged.
- **Returns:** Extracted metadata or status (`enriched`, `skipped`, `no_metadata`).

### `smp/enrich/batch`
Enriches all nodes within a given scope.
- **Params:**
- `scope` (string): `"full"`, `"package:<path>"`, or `"file:<path>"`.
- `force` (bool): Force re-enrichment.
- **Returns:** Counts of enriched, skipped, and failed nodes.

### `smp/enrich/stale`
Lists nodes whose source code has changed since the last enrichment.
- **Params:** `scope` (string).
- **Returns:** List of stale nodes with `current_hash` vs `enriched_hash`.

### `smp/annotate`
Manually set metadata on a node (used for `no_metadata` nodes).
- **Params:**
- `node_id` (string).
- `description` (string).
- `tags` (list[string]).
- **Returns:** Confirmation of annotation.

### `smp/tag`
Bulk-apply or remove tags across a scope.
- **Params:**
- `scope` (string).
- `tags` (list[string]).
- `action` (string): `"add"`, `"remove"`, or `"replace"`.

---

## 🌐 Community & Architecture

### `smp/community/detect`
Runs the Louvain algorithm to partition the codebase into Coarse (L0) and Fine (L1) communities.
- **Params:**
- `algorithm` (string): `"louvain"`.
- `relationship_types` (list): Types to consider (e.g., `["CALLS_STATIC", "IMPORTS"]`).
- `levels` (list): Resolution settings for L0 and L1.
- **Returns:** Community statistics and list of detected communities.

### `smp/community/list`
Lists all detected communities.
- **Params:** `level` (int): `0` (coarse), `1` (fine), or omit for both.
- **Returns:** List of community objects (labels, member counts, etc.).

### `smp/community/get`
Gets all nodes within a specific community.
- **Params:**
- `community_id` (string).
- `node_types` (list, optional).
- `include_bridges` (bool): Include edges crossing into other communities.

### `smp/community/boundaries`
Calculates coupling strength between community pairs.
- **Params:**
- `level` (int): `0` or `1`.
- `min_coupling` (float): Filter out pairs below this weight.
- **Returns:** Coupling weights and the specific "bridge nodes" responsible for the coupling.

---

## 🧠 Agent Context

### `smp/context`
The primary method for agents to get a "mental model" of a file.
- **Params:**
- `file_path` (string).
- `scope` (string): `"edit"`, `"review"`, or `"architect"`.
- `depth` (int): Traversal depth for related patterns.
- **Returns:** A comprehensive context object containing:
- `self`: The file node.
- `imports` / `imported_by`: Dependency graph.
- `defines`: Symbols defined in the file.
- `summary`: A pre-computed structural summary (blast radius, complexity, heat score).

---

## ⚠️ Error Codes

| Code | Message | Description |
| :--- | :--- | :--- |
| `-32600` | Invalid Request | JSON parsing error. |
| `-32601` | Method not found | The requested SMP method does not exist. |
| `-32001` | Node not found | The specified `node_id` does not exist in the graph. |
| `-32002` | Conflict | Attempted to overwrite a docstring without `force: true`. |
92 changes: 92 additions & 0 deletions ARCHITECTURE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
# Architecture Guide: Structural Memory Protocol (SMP)

The Structural Memory Protocol (SMP) is designed to provide AI agents with a "programmer's mental model" of a codebase. Unlike traditional RAG, which treats code as a series of text chunks, SMP treats code as a structured graph of interrelated entities.

## 🎯 Design Goals
- **Precision over Probability:** Replace "likely" text matches with "exact" structural relationships.
- **Architectural Awareness:** Enable agents to understand domain boundaries and module coupling.
- **Scalability:** Support massive codebases by routing queries to specific structural communities.
- **Hybrid Truth:** Combine the "what the code says" (static) with "what the code does" (runtime).

---

## ⚙️ The Ingestion Pipeline

The ingestion pipeline transforms raw source code into a queryable knowledge graph.

### 1. Parser (AST Extraction)
SMP uses **Tree-sitter** to perform fast, incremental parsing of multiple languages. It extracts high-level entities:
- **Nodes:** Classes, Functions, Variables, Interfaces.
- **Metadata:** Signatures, docstrings, modifiers (e.g., `async`, `export`).
- **Dependencies:** Import statements and export lists.

### 2. Graph Builder & The Linker
The Graph Builder creates the initial nodes and relationships. The **Linker** then resolves these relationships to ensure accuracy.

#### Static Linking (Namespaced Resolution)
To avoid ambiguity (e.g., two different files having a `save()` function), the Static Linker uses the file's `imports` as a namespace map. It traces a call to its exact origin file, marking edges as `resolved: true` or `CALLS_UNRESOLVED`.

#### Runtime Linking (eBPF Traces)
Static analysis cannot resolve Dependency Injection or Metaprogramming. SMP uses a **Runtime Linker** that:
1. Spawns a sandboxed environment.
2. Executes the code (e.g., via a test suite).
3. Captures kernel-level function entries/exits using **eBPF**.
4. Injects `CALLS_RUNTIME` edges into the graph.

### 3. Enricher
The Enricher attaches human-readable semantic metadata to structural nodes without using an LLM. It extracts:
- Docstrings and inline comments.
- Decorators and type annotations.
- Source hashes (to detect when a node becomes "stale" and needs re-enrichment).

### 4. Community Detection
SMP uses the **Louvain Algorithm** via Neo4j GDS to partition the graph into two levels of structural clusters:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The documentation states that the Louvain algorithm is executed via Neo4j GDS. However, the implementation in smp/engine/community.py (lines 241-288) is a custom Python-based Louvain implementation. This should be corrected to accurately describe the system architecture.

- **Level 0 (Coarse):** High-level architectural domains (e.g., `api_gateway`, `data_layer`).
- **Level 1 (Fine):** Detailed functional modules (e.g., `auth_oauth`, `payments_stripe`).

Each community is assigned a **centroid embedding** (the mean of its members' embeddings), enabling efficient query routing.

---

## 🔍 The Query Engine: SeedWalkEngine

The `SeedWalkEngine` implements a 4-phase pipeline to find the most relevant code for a given query.

### Phase 0: Route
The query embedding is compared against the **Level-1 Community Centroids** in ChromaDB. If the confidence exceeds a threshold, the search is scoped to that specific community (~200 nodes), drastically reducing noise.

### Phase 1: Seed
A vector search is performed in ChromaDB to find the top-K "seed" nodes whose signatures or docstrings most closely match the query.

### Phase 2: Walk
From the seeds, the engine performs a multi-hop traversal in Neo4j, following `CALLS_STATIC`, `CALLS_RUNTIME`, and `IMPORTS` edges. This captures the structural context (who calls this? what does this call?).

### Phase 3: Rank
Nodes are ranked using a composite score:
$$\text{Score} = \alpha \cdot \text{VectorSimilarity} + \beta \cdot \text{NormalizedPageRank} + \gamma \cdot \text{HeatScore}$$
- **Vector Similarity:** Relevance to the query.
- **PageRank:** Structural importance in the graph.
- **Heat Score:** Frequency of execution (from telemetry/runtime traces).

### Phase 4: Assemble
The engine produces a ranked list of `RankedResult` objects and a `structural_map` (adjacency list) allowing the agent to visualize the call chain.

---

## 💾 Persistence Layer

SMP utilizes a dual-store strategy to balance speed and structure.

| Store | Technology | Role | Data Held |
| :--- | :--- | :--- | :--- |
| **Graph Store** | **Neo4j** | Structural Truth | Entities, Relationships, Communities, PageRank, Full-Text Index. |
| **Vector Store** | **ChromaDB** | Entry Point | Node Embeddings, Community Centroids. |

---

## 🔌 MCP Integration

SMP implements the **Model Context Protocol (MCP)**. This allows it to serve as a "Codebase Memory Server" for any MCP-compatible client. Instead of the agent reading files blindly, it calls SMP tools to:
1. `locate`: Find the right starting point in a massive repo.
2. `get_context`: Get a structural summary of a file and its dependencies.
3. `assess_impact`: Find all nodes affected by a potential change.
95 changes: 95 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,95 @@
# Contributing to SMP

Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines.

## 🛠 Development Environment

### Python Version
SMP requires **Python 3.11** explicitly. It uses features like `X | Y` unions and `tomllib` that are not available in older versions.

### Setup
1. **Create a Virtual Environment:**
```bash
python3.11 -m venv .venv
source .venv/bin/activate
```
2. **Install Dependencies:**
```bash
pip install -e ".[dev]"
```
3. **Configure Environment:**
Copy `.env.example` to `.env` and configure your Neo4j credentials.

---

## 📝 Coding Standards

We enforce a strict a set of styles to ensure the codebase remains maintainable for both humans and AI agents.

### Imports
- Every file must start with `from __future__ import annotations`.
- Group imports: `stdlib` $\rightarrow$ `third-party` $\rightarrow$ `local`, separated by blank lines.
- Use absolute imports for local modules: `from smp.core.models import GraphNode`.

### Type Annotations
- **Strict Typing:** All function signatures must have full type annotations.
- **Modern Unions:** Use `X | Y` instead of `Optional[X]` or `Union[X, Y]`.
- **Built-in Generics:** Use `list[...]`, `dict[...]`, `set[...]` instead of `List`, `Dict`, `Set`.

### Naming & Style
- **Classes:** `PascalCase`
- **Functions/Methods:** `snake_case`
- **Private Members:** `_leading_underscore`
- **Docstrings:** Use triple double-quotes, imperative mood, and Google style.
- **Line Length:** Max 120 characters.

### Architectural Patterns
- **Layered Design:** `core` (models) $\rightarrow$ `engine` (logic) $\rightarrow$ `protocol` (API) $\rightarrow$ `store` (persistence).
- **Interfaces:** Use `abc.ABC` and `@abc.abstractmethod` for all store and parser interfaces.
- **Models:** Use `msgspec.Struct` for data models; prefer `frozen=True` for immutability.

---

## 🔄 Development Workflow

### Branching
- Use `feature/description` for new functionality.
- Use `fix/description` for bug fixes.

### Linting & Formatting
We use **Ruff** for both linting and formatting.
```bash
# Check for lint errors
ruff check .

# Automatically format code
ruff format .
```

### Type Checking
We use **Mypy** in strict mode.
```bash
mypy smp/
```

### Testing
We use **pytest** with `pytest-asyncio`.
```bash
# Run all tests
pytest

# Run a specific test file
pytest tests/test_query.py
```

---

## ✅ Pre-Commit Checklist

Before submitting a Pull Request, ensure you have completed these four steps:
1. [ ] `ruff check .` — No lint errors.
2. [ ] `ruff format .` — Code is perfectly formatted.
3. [ ] `mypy smp/` — No type errors.
4. [ ] `pytest` — All tests pass.

For detailed agent-specific instructions, please refer to `AGENTS.md`.
Loading
Loading