offx-zinth · offx-zinth · Apr 19, 2026 · Apr 19, 2026 · gemini-code-assist · Apr 19, 2026
diff --git a/API.md b/API.md
@@ -0,0 +1,134 @@
+# API Reference: Structural Memory Protocol (SMP)
+
+SMP exposes a **JSON-RPC 2.0** API. All requests must be sent as POST requests to `/rpc` with `Content-Type: application/json`.
+
+## 📡 General Request Format
+```json
+{
+  "jsonrpc": "2.0",
+  "method": "smp/method_name",
+  "params": { ... },
+  "id": 1
+}
+```
+
+---
+
+## 🔍 Discovery & Search
+
+### `smp/locate`
+Finds relevant code entities using Community-Routed Graph RAG.
+- **Params:**
+    - `query` (string): The natural language description of what to find.
+    - `seed_k` (int, optional): Number of initial vector seeds. Default: 3.
+    - `hops` (int, optional): Depth of graph traversal. Default: 2.
+    - `top_k` (int, optional): Number of final results. Default: 10.
+- **Returns:** `LocateResponse` containing ranked results and a structural map of relationships.
+
+### `smp/search`
+BM25-ranked full-text search across enriched metadata.
+- **Params:**
+    - `query` (string): Keywords to search.
+    - `match` (string): `"all"` (AND) or `"any"` (OR).
+    - `filter` (object, optional):
+        - `node_types` (list): e.g., `["Function", "Class"]`.
+        - `tags` (list): e.g., `["billing"]`.
+        - `scope` (string): e.g., `"package:src/payments"`.
+    - `top_k` (int): Number of results.
+- **Returns:** List of matches ranked by BM25 score.
+
+---
+
+## 🛠 Enrichment & Annotation
+
+### `smp/enrich`
+Extracts static metadata (docstrings, decorators) from a specific node.
+- **Params:**
+    - `node_id` (string): ID of the node to enrich.
+    - `force` (bool, optional): Re-enrich even if source hash is unchanged.
+- **Returns:** Extracted metadata or status (`enriched`, `skipped`, `no_metadata`).
+
+### `smp/enrich/batch`
+Enriches all nodes within a given scope.
+- **Params:**
+    - `scope` (string): `"full"`, `"package:<path>"`, or `"file:<path>"`.
+    - `force` (bool): Force re-enrichment.
+- **Returns:** Counts of enriched, skipped, and failed nodes.
+
+### `smp/enrich/stale`
+Lists nodes whose source code has changed since the last enrichment.
+- **Params:** `scope` (string).
+- **Returns:** List of stale nodes with `current_hash` vs `enriched_hash`.
+
+### `smp/annotate`
+Manually set metadata on a node (used for `no_metadata` nodes).
+- **Params:**
+    - `node_id` (string).
+    - `description` (string).
+    - `tags` (list[string]).
+- **Returns:** Confirmation of annotation.
+
+### `smp/tag`
+Bulk-apply or remove tags across a scope.
+- **Params:**
+    - `scope` (string).
+    - `tags` (list[string]).
+    - `action` (string): `"add"`, `"remove"`, or `"replace"`.
+
+---
+
+## 🌐 Community & Architecture
+
+### `smp/community/detect`
+Runs the Louvain algorithm to partition the codebase into Coarse (L0) and Fine (L1) communities.
+- **Params:**
+    - `algorithm` (string): `"louvain"`.
+    - `relationship_types` (list): Types to consider (e.g., `["CALLS_STATIC", "IMPORTS"]`).
+    - `levels` (list): Resolution settings for L0 and L1.
+- **Returns:** Community statistics and list of detected communities.
+
+### `smp/community/list`
+Lists all detected communities.
+- **Params:** `level` (int): `0` (coarse), `1` (fine), or omit for both.
+- **Returns:** List of community objects (labels, member counts, etc.).
+
+### `smp/community/get`
+Gets all nodes within a specific community.
+- **Params:**
+    - `community_id` (string).
+    - `node_types` (list, optional).
+    - `include_bridges` (bool): Include edges crossing into other communities.
+
+### `smp/community/boundaries`
+Calculates coupling strength between community pairs.
+- **Params:**
+    - `level` (int): `0` or `1`.
+    - `min_coupling` (float): Filter out pairs below this weight.
+- **Returns:** Coupling weights and the specific "bridge nodes" responsible for the coupling.
+
+---
+
+## 🧠 Agent Context
+
+### `smp/context`
+The primary method for agents to get a "mental model" of a file.
+- **Params:**
+    - `file_path` (string).
+    - `scope` (string): `"edit"`, `"review"`, or `"architect"`.
+    - `depth` (int): Traversal depth for related patterns.
+- **Returns:** A comprehensive context object containing:
+    - `self`: The file node.
+    - `imports` / `imported_by`: Dependency graph.
+    - `defines`: Symbols defined in the file.
+    - `summary`: A pre-computed structural summary (blast radius, complexity, heat score).
+
+---
+
+## ⚠️ Error Codes
+
+| Code | Message | Description |
+| :--- | :--- | :--- |
+| `-32600` | Invalid Request | JSON parsing error. |
+| `-32601` | Method not found | The requested SMP method does not exist. |
+| `-32001` | Node not found | The specified `node_id` does not exist in the graph. |
+| `-32002` | Conflict | Attempted to overwrite a docstring without `force: true`. |
diff --git a/ARCHITECTURE.md b/ARCHITECTURE.md
@@ -0,0 +1,92 @@
+# Architecture Guide: Structural Memory Protocol (SMP)
+
+The Structural Memory Protocol (SMP) is designed to provide AI agents with a "programmer's mental model" of a codebase. Unlike traditional RAG, which treats code as a series of text chunks, SMP treats code as a structured graph of interrelated entities.
+
+## 🎯 Design Goals
+- **Precision over Probability:** Replace "likely" text matches with "exact" structural relationships.
+- **Architectural Awareness:** Enable agents to understand domain boundaries and module coupling.
+- **Scalability:** Support massive codebases by routing queries to specific structural communities.
+- **Hybrid Truth:** Combine the "what the code says" (static) with "what the code does" (runtime).
+
+---
+
+## ⚙️ The Ingestion Pipeline
+
+The ingestion pipeline transforms raw source code into a queryable knowledge graph.
+
+### 1. Parser (AST Extraction)
+SMP uses **Tree-sitter** to perform fast, incremental parsing of multiple languages. It extracts high-level entities:
+- **Nodes:** Classes, Functions, Variables, Interfaces.
+- **Metadata:** Signatures, docstrings, modifiers (e.g., `async`, `export`).
+- **Dependencies:** Import statements and export lists.
+
+### 2. Graph Builder & The Linker
+The Graph Builder creates the initial nodes and relationships. The **Linker** then resolves these relationships to ensure accuracy.
+
+#### Static Linking (Namespaced Resolution)
+To avoid ambiguity (e.g., two different files having a `save()` function), the Static Linker uses the file's `imports` as a namespace map. It traces a call to its exact origin file, marking edges as `resolved: true` or `CALLS_UNRESOLVED`.
+
+#### Runtime Linking (eBPF Traces)
+Static analysis cannot resolve Dependency Injection or Metaprogramming. SMP uses a **Runtime Linker** that:
+1. Spawns a sandboxed environment.
+2. Executes the code (e.g., via a test suite).
+3. Captures kernel-level function entries/exits using **eBPF**.
+4. Injects `CALLS_RUNTIME` edges into the graph.
+
+### 3. Enricher
+The Enricher attaches human-readable semantic metadata to structural nodes without using an LLM. It extracts:
+- Docstrings and inline comments.
+- Decorators and type annotations.
+- Source hashes (to detect when a node becomes "stale" and needs re-enrichment).
+
+### 4. Community Detection
+SMP uses the **Louvain Algorithm** via Neo4j GDS to partition the graph into two levels of structural clusters:
+- **Level 0 (Coarse):** High-level architectural domains (e.g., `api_gateway`, `data_layer`).
+- **Level 1 (Fine):** Detailed functional modules (e.g., `auth_oauth`, `payments_stripe`).
+
+Each community is assigned a **centroid embedding** (the mean of its members' embeddings), enabling efficient query routing.
+
+---
+
+## 🔍 The Query Engine: SeedWalkEngine
+
+The `SeedWalkEngine` implements a 4-phase pipeline to find the most relevant code for a given query.
+
+### Phase 0: Route
+The query embedding is compared against the **Level-1 Community Centroids** in ChromaDB. If the confidence exceeds a threshold, the search is scoped to that specific community (~200 nodes), drastically reducing noise.
+
+### Phase 1: Seed
+A vector search is performed in ChromaDB to find the top-K "seed" nodes whose signatures or docstrings most closely match the query.
+
+### Phase 2: Walk
+From the seeds, the engine performs a multi-hop traversal in Neo4j, following `CALLS_STATIC`, `CALLS_RUNTIME`, and `IMPORTS` edges. This captures the structural context (who calls this? what does this call?).
+
+### Phase 3: Rank
+Nodes are ranked using a composite score:
+$$\text{Score} = \alpha \cdot \text{VectorSimilarity} + \beta \cdot \text{NormalizedPageRank} + \gamma \cdot \text{HeatScore}$$
+- **Vector Similarity:** Relevance to the query.
+- **PageRank:** Structural importance in the graph.
+- **Heat Score:** Frequency of execution (from telemetry/runtime traces).
+
+### Phase 4: Assemble
+The engine produces a ranked list of `RankedResult` objects and a `structural_map` (adjacency list) allowing the agent to visualize the call chain.
+
+---
+
+## 💾 Persistence Layer
+
+SMP utilizes a dual-store strategy to balance speed and structure.
+
+| Store | Technology | Role | Data Held |
+| :--- | :--- | :--- | :--- |
+| **Graph Store** | **Neo4j** | Structural Truth | Entities, Relationships, Communities, PageRank, Full-Text Index. |
+| **Vector Store** | **ChromaDB** | Entry Point | Node Embeddings, Community Centroids. |
+
+---
+
+## 🔌 MCP Integration
+
+SMP implements the **Model Context Protocol (MCP)**. This allows it to serve as a "Codebase Memory Server" for any MCP-compatible client. Instead of the agent reading files blindly, it calls SMP tools to:
+1. `locate`: Find the right starting point in a massive repo.
+2. `get_context`: Get a structural summary of a file and its dependencies.
+3. `assess_impact`: Find all nodes affected by a potential change.
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -0,0 +1,95 @@
+# Contributing to SMP
+
+Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines.
+
+## 🛠 Development Environment
+
+### Python Version
+SMP requires **Python 3.11** explicitly. It uses features like `X | Y` unions and `tomllib` that are not available in older versions.
+
+### Setup
+1. **Create a Virtual Environment:**
+   ```bash
+   python3.11 -m venv .venv
+   source .venv/bin/activate
+   ```
+2. **Install Dependencies:**
+   ```bash
+   pip install -e ".[dev]"
+   ```
+3. **Configure Environment:**
+   Copy `.env.example` to `.env` and configure your Neo4j credentials.
+
+---
+
+## 📝 Coding Standards
+
+We enforce a strict a set of styles to ensure the codebase remains maintainable for both humans and AI agents.
+
+### Imports
+- Every file must start with `from __future__ import annotations`.
+- Group imports: `stdlib` $\rightarrow$ `third-party` $\rightarrow$ `local`, separated by blank lines.
+- Use absolute imports for local modules: `from smp.core.models import GraphNode`.
+
+### Type Annotations
+- **Strict Typing:** All function signatures must have full type annotations.
+- **Modern Unions:** Use `X | Y` instead of `Optional[X]` or `Union[X, Y]`.
+- **Built-in Generics:** Use `list[...]`, `dict[...]`, `set[...]` instead of `List`, `Dict`, `Set`.
+
+### Naming & Style
+- **Classes:** `PascalCase`
+- **Functions/Methods:** `snake_case`
+- **Private Members:** `_leading_underscore`
+- **Docstrings:** Use triple double-quotes, imperative mood, and Google style.
+- **Line Length:** Max 120 characters.
+
+### Architectural Patterns
+- **Layered Design:** `core` (models) $\rightarrow$ `engine` (logic) $\rightarrow$ `protocol` (API) $\rightarrow$ `store` (persistence).
+- **Interfaces:** Use `abc.ABC` and `@abc.abstractmethod` for all store and parser interfaces.
+- **Models:** Use `msgspec.Struct` for data models; prefer `frozen=True` for immutability.
+
+---
+
+## 🔄 Development Workflow
+
+### Branching
+- Use `feature/description` for new functionality.
+- Use `fix/description` for bug fixes.
+
+### Linting & Formatting
+We use **Ruff** for both linting and formatting.
+```bash
+# Check for lint errors
+ruff check .
+
+# Automatically format code
+ruff format .
+```
+
+### Type Checking
+We use **Mypy** in strict mode.
+```bash
+mypy smp/
+```
+
+### Testing
+We use **pytest** with `pytest-asyncio`.
+```bash
+# Run all tests
+pytest
+
+# Run a specific test file
+pytest tests/test_query.py
+```
+
+---
+
+## ✅ Pre-Commit Checklist
+
+Before submitting a Pull Request, ensure you have completed these four steps:
+1. [ ] `ruff check .` — No lint errors.
+2. [ ] `ruff format .` — Code is perfectly formatted.
+3. [ ] `mypy smp/` — No type errors.
+4. [ ] `pytest` — All tests pass.
+
+For detailed agent-specific instructions, please refer to `AGENTS.md`.