-
Notifications
You must be signed in to change notification settings - Fork 0
documentation work #26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,134 @@ | ||
| # API Reference: Structural Memory Protocol (SMP) | ||
|
|
||
| SMP exposes a **JSON-RPC 2.0** API. All requests must be sent as POST requests to `/rpc` with `Content-Type: application/json`. | ||
|
|
||
| ## 📡 General Request Format | ||
| ```json | ||
| { | ||
| "jsonrpc": "2.0", | ||
| "method": "smp/method_name", | ||
| "params": { ... }, | ||
| "id": 1 | ||
| } | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 🔍 Discovery & Search | ||
|
|
||
| ### `smp/locate` | ||
| Finds relevant code entities using Community-Routed Graph RAG. | ||
| - **Params:** | ||
| - `query` (string): The natural language description of what to find. | ||
| - `seed_k` (int, optional): Number of initial vector seeds. Default: 3. | ||
| - `hops` (int, optional): Depth of graph traversal. Default: 2. | ||
| - `top_k` (int, optional): Number of final results. Default: 10. | ||
| - **Returns:** `LocateResponse` containing ranked results and a structural map of relationships. | ||
|
Comment on lines
+23
to
+26
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There are several discrepancies between the documentation and the implementation of
|
||
|
|
||
| ### `smp/search` | ||
| BM25-ranked full-text search across enriched metadata. | ||
| - **Params:** | ||
| - `query` (string): Keywords to search. | ||
| - `match` (string): `"all"` (AND) or `"any"` (OR). | ||
| - `filter` (object, optional): | ||
| - `node_types` (list): e.g., `["Function", "Class"]`. | ||
| - `tags` (list): e.g., `["billing"]`. | ||
| - `scope` (string): e.g., `"package:src/payments"`. | ||
| - `top_k` (int): Number of results. | ||
| - **Returns:** List of matches ranked by BM25 score. | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
|
|
||
| --- | ||
|
|
||
| ## 🛠 Enrichment & Annotation | ||
|
|
||
| ### `smp/enrich` | ||
| Extracts static metadata (docstrings, decorators) from a specific node. | ||
| - **Params:** | ||
| - `node_id` (string): ID of the node to enrich. | ||
| - `force` (bool, optional): Re-enrich even if source hash is unchanged. | ||
| - **Returns:** Extracted metadata or status (`enriched`, `skipped`, `no_metadata`). | ||
|
|
||
| ### `smp/enrich/batch` | ||
| Enriches all nodes within a given scope. | ||
| - **Params:** | ||
| - `scope` (string): `"full"`, `"package:<path>"`, or `"file:<path>"`. | ||
| - `force` (bool): Force re-enrichment. | ||
| - **Returns:** Counts of enriched, skipped, and failed nodes. | ||
|
|
||
| ### `smp/enrich/stale` | ||
| Lists nodes whose source code has changed since the last enrichment. | ||
| - **Params:** `scope` (string). | ||
| - **Returns:** List of stale nodes with `current_hash` vs `enriched_hash`. | ||
|
|
||
| ### `smp/annotate` | ||
| Manually set metadata on a node (used for `no_metadata` nodes). | ||
| - **Params:** | ||
| - `node_id` (string). | ||
| - `description` (string). | ||
| - `tags` (list[string]). | ||
| - **Returns:** Confirmation of annotation. | ||
|
|
||
| ### `smp/tag` | ||
| Bulk-apply or remove tags across a scope. | ||
| - **Params:** | ||
| - `scope` (string). | ||
| - `tags` (list[string]). | ||
| - `action` (string): `"add"`, `"remove"`, or `"replace"`. | ||
|
|
||
| --- | ||
|
|
||
| ## 🌐 Community & Architecture | ||
|
|
||
| ### `smp/community/detect` | ||
| Runs the Louvain algorithm to partition the codebase into Coarse (L0) and Fine (L1) communities. | ||
| - **Params:** | ||
| - `algorithm` (string): `"louvain"`. | ||
| - `relationship_types` (list): Types to consider (e.g., `["CALLS_STATIC", "IMPORTS"]`). | ||
| - `levels` (list): Resolution settings for L0 and L1. | ||
| - **Returns:** Community statistics and list of detected communities. | ||
|
|
||
| ### `smp/community/list` | ||
| Lists all detected communities. | ||
| - **Params:** `level` (int): `0` (coarse), `1` (fine), or omit for both. | ||
| - **Returns:** List of community objects (labels, member counts, etc.). | ||
|
|
||
| ### `smp/community/get` | ||
| Gets all nodes within a specific community. | ||
| - **Params:** | ||
| - `community_id` (string). | ||
| - `node_types` (list, optional). | ||
| - `include_bridges` (bool): Include edges crossing into other communities. | ||
|
|
||
| ### `smp/community/boundaries` | ||
| Calculates coupling strength between community pairs. | ||
| - **Params:** | ||
| - `level` (int): `0` or `1`. | ||
| - `min_coupling` (float): Filter out pairs below this weight. | ||
| - **Returns:** Coupling weights and the specific "bridge nodes" responsible for the coupling. | ||
|
|
||
| --- | ||
|
|
||
| ## 🧠 Agent Context | ||
|
|
||
| ### `smp/context` | ||
| The primary method for agents to get a "mental model" of a file. | ||
| - **Params:** | ||
| - `file_path` (string). | ||
| - `scope` (string): `"edit"`, `"review"`, or `"architect"`. | ||
| - `depth` (int): Traversal depth for related patterns. | ||
| - **Returns:** A comprehensive context object containing: | ||
| - `self`: The file node. | ||
| - `imports` / `imported_by`: Dependency graph. | ||
| - `defines`: Symbols defined in the file. | ||
| - `summary`: A pre-computed structural summary (blast radius, complexity, heat score). | ||
|
|
||
| --- | ||
|
|
||
| ## ⚠️ Error Codes | ||
|
|
||
| | Code | Message | Description | | ||
| | :--- | :--- | :--- | | ||
| | `-32600` | Invalid Request | JSON parsing error. | | ||
| | `-32601` | Method not found | The requested SMP method does not exist. | | ||
| | `-32001` | Node not found | The specified `node_id` does not exist in the graph. | | ||
| | `-32002` | Conflict | Attempted to overwrite a docstring without `force: true`. | | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,92 @@ | ||
| # Architecture Guide: Structural Memory Protocol (SMP) | ||
|
|
||
| The Structural Memory Protocol (SMP) is designed to provide AI agents with a "programmer's mental model" of a codebase. Unlike traditional RAG, which treats code as a series of text chunks, SMP treats code as a structured graph of interrelated entities. | ||
|
|
||
| ## 🎯 Design Goals | ||
| - **Precision over Probability:** Replace "likely" text matches with "exact" structural relationships. | ||
| - **Architectural Awareness:** Enable agents to understand domain boundaries and module coupling. | ||
| - **Scalability:** Support massive codebases by routing queries to specific structural communities. | ||
| - **Hybrid Truth:** Combine the "what the code says" (static) with "what the code does" (runtime). | ||
|
|
||
| --- | ||
|
|
||
| ## ⚙️ The Ingestion Pipeline | ||
|
|
||
| The ingestion pipeline transforms raw source code into a queryable knowledge graph. | ||
|
|
||
| ### 1. Parser (AST Extraction) | ||
| SMP uses **Tree-sitter** to perform fast, incremental parsing of multiple languages. It extracts high-level entities: | ||
| - **Nodes:** Classes, Functions, Variables, Interfaces. | ||
| - **Metadata:** Signatures, docstrings, modifiers (e.g., `async`, `export`). | ||
| - **Dependencies:** Import statements and export lists. | ||
|
|
||
| ### 2. Graph Builder & The Linker | ||
| The Graph Builder creates the initial nodes and relationships. The **Linker** then resolves these relationships to ensure accuracy. | ||
|
|
||
| #### Static Linking (Namespaced Resolution) | ||
| To avoid ambiguity (e.g., two different files having a `save()` function), the Static Linker uses the file's `imports` as a namespace map. It traces a call to its exact origin file, marking edges as `resolved: true` or `CALLS_UNRESOLVED`. | ||
|
|
||
| #### Runtime Linking (eBPF Traces) | ||
| Static analysis cannot resolve Dependency Injection or Metaprogramming. SMP uses a **Runtime Linker** that: | ||
| 1. Spawns a sandboxed environment. | ||
| 2. Executes the code (e.g., via a test suite). | ||
| 3. Captures kernel-level function entries/exits using **eBPF**. | ||
| 4. Injects `CALLS_RUNTIME` edges into the graph. | ||
|
|
||
| ### 3. Enricher | ||
| The Enricher attaches human-readable semantic metadata to structural nodes without using an LLM. It extracts: | ||
| - Docstrings and inline comments. | ||
| - Decorators and type annotations. | ||
| - Source hashes (to detect when a node becomes "stale" and needs re-enrichment). | ||
|
|
||
| ### 4. Community Detection | ||
| SMP uses the **Louvain Algorithm** via Neo4j GDS to partition the graph into two levels of structural clusters: | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| - **Level 0 (Coarse):** High-level architectural domains (e.g., `api_gateway`, `data_layer`). | ||
| - **Level 1 (Fine):** Detailed functional modules (e.g., `auth_oauth`, `payments_stripe`). | ||
|
|
||
| Each community is assigned a **centroid embedding** (the mean of its members' embeddings), enabling efficient query routing. | ||
|
|
||
| --- | ||
|
|
||
| ## 🔍 The Query Engine: SeedWalkEngine | ||
|
|
||
| The `SeedWalkEngine` implements a 4-phase pipeline to find the most relevant code for a given query. | ||
|
|
||
| ### Phase 0: Route | ||
| The query embedding is compared against the **Level-1 Community Centroids** in ChromaDB. If the confidence exceeds a threshold, the search is scoped to that specific community (~200 nodes), drastically reducing noise. | ||
|
|
||
| ### Phase 1: Seed | ||
| A vector search is performed in ChromaDB to find the top-K "seed" nodes whose signatures or docstrings most closely match the query. | ||
|
|
||
| ### Phase 2: Walk | ||
| From the seeds, the engine performs a multi-hop traversal in Neo4j, following `CALLS_STATIC`, `CALLS_RUNTIME`, and `IMPORTS` edges. This captures the structural context (who calls this? what does this call?). | ||
|
|
||
| ### Phase 3: Rank | ||
| Nodes are ranked using a composite score: | ||
| $$\text{Score} = \alpha \cdot \text{VectorSimilarity} + \beta \cdot \text{NormalizedPageRank} + \gamma \cdot \text{HeatScore}$$ | ||
| - **Vector Similarity:** Relevance to the query. | ||
| - **PageRank:** Structural importance in the graph. | ||
| - **Heat Score:** Frequency of execution (from telemetry/runtime traces). | ||
|
|
||
| ### Phase 4: Assemble | ||
| The engine produces a ranked list of `RankedResult` objects and a `structural_map` (adjacency list) allowing the agent to visualize the call chain. | ||
|
|
||
| --- | ||
|
|
||
| ## 💾 Persistence Layer | ||
|
|
||
| SMP utilizes a dual-store strategy to balance speed and structure. | ||
|
|
||
| | Store | Technology | Role | Data Held | | ||
| | :--- | :--- | :--- | :--- | | ||
| | **Graph Store** | **Neo4j** | Structural Truth | Entities, Relationships, Communities, PageRank, Full-Text Index. | | ||
| | **Vector Store** | **ChromaDB** | Entry Point | Node Embeddings, Community Centroids. | | ||
|
|
||
| --- | ||
|
|
||
| ## 🔌 MCP Integration | ||
|
|
||
| SMP implements the **Model Context Protocol (MCP)**. This allows it to serve as a "Codebase Memory Server" for any MCP-compatible client. Instead of the agent reading files blindly, it calls SMP tools to: | ||
| 1. `locate`: Find the right starting point in a massive repo. | ||
| 2. `get_context`: Get a structural summary of a file and its dependencies. | ||
| 3. `assess_impact`: Find all nodes affected by a potential change. | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,95 @@ | ||
| # Contributing to SMP | ||
|
|
||
| Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines. | ||
|
|
||
| ## 🛠 Development Environment | ||
|
|
||
| ### Python Version | ||
| SMP requires **Python 3.11** explicitly. It uses features like `X | Y` unions and `tomllib` that are not available in older versions. | ||
|
|
||
| ### Setup | ||
| 1. **Create a Virtual Environment:** | ||
| ```bash | ||
| python3.11 -m venv .venv | ||
| source .venv/bin/activate | ||
| ``` | ||
| 2. **Install Dependencies:** | ||
| ```bash | ||
| pip install -e ".[dev]" | ||
| ``` | ||
| 3. **Configure Environment:** | ||
| Copy `.env.example` to `.env` and configure your Neo4j credentials. | ||
|
|
||
| --- | ||
|
|
||
| ## 📝 Coding Standards | ||
|
|
||
| We enforce a strict a set of styles to ensure the codebase remains maintainable for both humans and AI agents. | ||
|
|
||
| ### Imports | ||
| - Every file must start with `from __future__ import annotations`. | ||
| - Group imports: `stdlib` $\rightarrow$ `third-party` $\rightarrow$ `local`, separated by blank lines. | ||
| - Use absolute imports for local modules: `from smp.core.models import GraphNode`. | ||
|
|
||
| ### Type Annotations | ||
| - **Strict Typing:** All function signatures must have full type annotations. | ||
| - **Modern Unions:** Use `X | Y` instead of `Optional[X]` or `Union[X, Y]`. | ||
| - **Built-in Generics:** Use `list[...]`, `dict[...]`, `set[...]` instead of `List`, `Dict`, `Set`. | ||
|
|
||
| ### Naming & Style | ||
| - **Classes:** `PascalCase` | ||
| - **Functions/Methods:** `snake_case` | ||
| - **Private Members:** `_leading_underscore` | ||
| - **Docstrings:** Use triple double-quotes, imperative mood, and Google style. | ||
| - **Line Length:** Max 120 characters. | ||
|
|
||
| ### Architectural Patterns | ||
| - **Layered Design:** `core` (models) $\rightarrow$ `engine` (logic) $\rightarrow$ `protocol` (API) $\rightarrow$ `store` (persistence). | ||
| - **Interfaces:** Use `abc.ABC` and `@abc.abstractmethod` for all store and parser interfaces. | ||
| - **Models:** Use `msgspec.Struct` for data models; prefer `frozen=True` for immutability. | ||
|
|
||
| --- | ||
|
|
||
| ## 🔄 Development Workflow | ||
|
|
||
| ### Branching | ||
| - Use `feature/description` for new functionality. | ||
| - Use `fix/description` for bug fixes. | ||
|
|
||
| ### Linting & Formatting | ||
| We use **Ruff** for both linting and formatting. | ||
| ```bash | ||
| # Check for lint errors | ||
| ruff check . | ||
|
|
||
| # Automatically format code | ||
| ruff format . | ||
| ``` | ||
|
|
||
| ### Type Checking | ||
| We use **Mypy** in strict mode. | ||
| ```bash | ||
| mypy smp/ | ||
| ``` | ||
|
|
||
| ### Testing | ||
| We use **pytest** with `pytest-asyncio`. | ||
| ```bash | ||
| # Run all tests | ||
| pytest | ||
|
|
||
| # Run a specific test file | ||
| pytest tests/test_query.py | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## ✅ Pre-Commit Checklist | ||
|
|
||
| Before submitting a Pull Request, ensure you have completed these four steps: | ||
| 1. [ ] `ruff check .` — No lint errors. | ||
| 2. [ ] `ruff format .` — Code is perfectly formatted. | ||
| 3. [ ] `mypy smp/` — No type errors. | ||
| 4. [ ] `pytest` — All tests pass. | ||
|
|
||
| For detailed agent-specific instructions, please refer to `AGENTS.md`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is missing documentation for several implemented methods found in
smp/protocol/handlers/query.py, specificallysmp/impact(handled byImpactHandler) andsmp/trace(handled byTraceHandler). These should be included to provide a complete API reference.