doc work 2#27
Conversation
There was a problem hiding this comment.
Code Review
This pull request significantly expands and matures the documentation for the Structural Memory Protocol (SMP), providing a comprehensive API reference, architectural deep-dives, and detailed contribution guidelines. The updates transition the documentation from high-level summaries to production-ready specifications. Feedback focuses on refining the professional tone by removing LLM-generated conversational filler, eliminating redundant content between the architecture and contribution guides, and ensuring that method names and project structures accurately reflect the underlying implementation and registration logic.
| Here are the detailed, production-ready `ARCHITECTURE.md` and `CONTRIBUTING.md` files based on the comprehensive Structural Memory Protocol (SMP) specifications. | ||
|
|
||
| ### 1. `ARCHITECTURE.md` | ||
|
|
||
| ```markdown |
There was a problem hiding this comment.
This file contains LLM-generated conversational filler and nested markdown blocks. The preamble and the extra backticks should be removed to maintain a professional documentation style.
| Here are the detailed, production-ready `ARCHITECTURE.md` and `CONTRIBUTING.md` files based on the comprehensive Structural Memory Protocol (SMP) specifications. | |
| ### 1. `ARCHITECTURE.md` | |
| ```markdown | |
| # Architecture Guide: Structural Memory Protocol (SMP) |
| ### 2. `CONTRIBUTING.md` | ||
|
|
||
| The `SeedWalkEngine` implements a 4-phase pipeline to find the most relevant code for a given query. | ||
| ```markdown | ||
| # Contributing to SMP | ||
|
|
||
| ### Phase 0: Route | ||
| The query embedding is compared against the **Level-1 Community Centroids** in ChromaDB. If the confidence exceeds a threshold, the search is scoped to that specific community (~200 nodes), drastically reducing noise. | ||
| Thank you for contributing to the Structural Memory Protocol (SMP)! To maintain the integrity, safety, and high performance of this agentic architecture, we enforce strict guidelines. | ||
|
|
||
| ### Phase 1: Seed | ||
| A vector search is performed in ChromaDB to find the top-K "seed" nodes whose signatures or docstrings most closely match the query. | ||
| ## 🛠 Development Environment | ||
|
|
||
| ### Phase 2: Walk | ||
| From the seeds, the engine performs a multi-hop traversal in Neo4j, following `CALLS_STATIC`, `CALLS_RUNTIME`, and `IMPORTS` edges. This captures the structural context (who calls this? what does this call?). | ||
| ### Python Version | ||
| SMP requires **Python 3.11** explicitly. We heavily utilize modern features like `X | Y` unions, `tomllib`, and performance optimizations not present in older versions. | ||
|
|
||
| ### Phase 3: Rank | ||
| Nodes are ranked using a composite score: | ||
| $$\text{Score} = \alpha \cdot \text{VectorSimilarity} + \beta \cdot \text{NormalizedPageRank} + \gamma \cdot \text{HeatScore}$$ | ||
| - **Vector Similarity:** Relevance to the query. | ||
| - **PageRank:** Structural importance in the graph. | ||
| - **Heat Score:** Frequency of execution (from telemetry/runtime traces). | ||
| ### Setup Instructions | ||
| 1. **Create a Virtual Environment:** | ||
| ```bash | ||
| python3.11 -m venv .venv | ||
| source .venv/bin/activate | ||
| ``` | ||
| 2. **Install Dependencies:** | ||
| ```bash | ||
| pip install -e ".[dev]" | ||
| ``` | ||
| 3. **Configure Environment:** | ||
| Copy `.env.example` to `.env` and configure your Neo4j and ChromaDB credentials. Note that Neo4j requires the GDS (Graph Data Science) plugin for Louvain and PageRank calculations. | ||
|
|
||
| --- | ||
|
|
||
| ### Phase 4: Assemble | ||
| The engine produces a ranked list of `RankedResult` objects and a `structural_map` (adjacency list) allowing the agent to visualize the call chain. | ||
| ## 🏛️ Architecture TL;DR | ||
| Before contributing, review `ARCHITECTURE.md`. SMP uses a layered design: | ||
| - `core/`: AST parsing, Linking (Static + eBPF), Enrichment, and persistence mapping. | ||
| - `engine/`: Query resolution (`SeedWalkEngine`), structural aggregations, context generation. | ||
| - `sandbox/`: MicroVM/Docker isolation, eBPF telemetry capture, and Mutation Testing. | ||
| - `protocol/`: JSON-RPC 2.0 endpoints utilizing the Dispatcher pattern. | ||
|
|
||
| --- | ||
|
|
||
| ## 💾 Persistence Layer | ||
| ## 📝 Coding Standards | ||
|
|
||
| SMP utilizes a dual-store strategy to balance speed and structure. | ||
| SMP is designed to be read by humans and navigated by AI agents. Predictability is paramount. | ||
|
|
||
| | Store | Technology | Role | Data Held | | ||
| | :--- | :--- | :--- | :--- | | ||
| | **Graph Store** | **Neo4j** | Structural Truth | Entities, Relationships, Communities, PageRank, Full-Text Index. | | ||
| | **Vector Store** | **ChromaDB** | Entry Point | Node Embeddings, Community Centroids. | | ||
| ### Imports | ||
| - Every file must start with `from __future__ import annotations`. | ||
| - Group imports: `stdlib` $\rightarrow$ `third-party` $\rightarrow$ `local`, separated by blank lines. | ||
| - **Always use absolute imports** for local modules: | ||
| `from smp.core.linker import StaticLinker` (Never `from ..linker import StaticLinker`). | ||
|
|
||
| ### Type Annotations & Data Models | ||
| - **Strict Typing:** All function signatures must have full type annotations. No implicit `Any`. | ||
| - **Modern Unions:** Use `X | Y` instead of `Optional[X]` or `Union[X, Y]`. | ||
| - **Built-in Generics:** Use `list[...]`, `dict[...]`, `set[...]` instead of the `typing` module equivalents. | ||
| - **Msgspec Structs:** All data flowing through the protocol and engine must be defined as `msgspec.Struct` classes with `frozen=True` to ensure zero-copy immutability and fast JSON serialization. | ||
|
|
||
| ```python | ||
| import msgspec | ||
|
|
||
| class RankedResult(msgspec.Struct, frozen=True): | ||
| node_id: str | ||
| node_type: str | ||
| vector_score: float | ||
| pagerank: float | ||
| is_seed: bool = False | ||
| ``` | ||
|
|
||
| ### Naming & Style | ||
| - **Classes:** `PascalCase` | ||
| - **Functions/Methods:** `snake_case` | ||
| - **Private Members:** Prefix with `_leading_underscore`. | ||
| - **Docstrings:** Use triple double-quotes, imperative mood, and Google style. Docstrings are heavily relied upon by the Graph RAG engine, so be descriptive. | ||
| - **Line Length:** Max 120 characters. | ||
|
|
||
| --- | ||
|
|
||
| ## 🔌 MCP Integration | ||
| ## 🔌 Adding Protocol Methods (The Dispatcher) | ||
|
|
||
| We do not use massive `if/elif` routers. If you are adding a new JSON-RPC endpoint to SMP, implement it in the appropriate module under `smp/protocol/handlers/` and use the `@rpc_method` decorator. | ||
|
|
||
| ```python | ||
| # smp/protocol/handlers/telemetry.py | ||
| from smp.protocol.dispatcher import rpc_method | ||
| from smp.core.models import ServerContext | ||
|
|
||
| @rpc_method("smp/telemetry/hot") | ||
| async def handle_telemetry_hot(params: dict, ctx: ServerContext) -> dict: | ||
| """Returns nodes with high churn and high blast radius.""" | ||
| window = params.get("window_days", 30) | ||
| return await ctx.engine.telemetry.get_hot_nodes(window) | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## 🔄 Development Workflow | ||
|
|
||
| ### Branching | ||
| - `feature/description` for new functionality. | ||
| - `fix/description` for bug fixes. | ||
| - `docs/description` for documentation updates. | ||
|
|
||
| ### Linting & Formatting | ||
| We use **Ruff** to enforce formatting and linting rules. | ||
| ```bash | ||
| # Check for lint errors | ||
| ruff check . | ||
|
|
||
| # Automatically format code | ||
| ruff format . | ||
| ``` | ||
|
|
||
| ### Type Checking | ||
| We rely on strict type boundaries. Run **Mypy** before committing: | ||
| ```bash | ||
| mypy smp/ | ||
| ``` | ||
|
|
||
| ### Testing | ||
| We use **pytest** combined with `pytest-asyncio` for all asynchronous graph engine tests. | ||
| ```bash | ||
| # Run all tests | ||
| pytest | ||
|
|
||
| # Run a specific module | ||
| pytest tests/engine/test_seed_walk.py | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## ✅ Pre-Commit Checklist | ||
|
|
||
| Before submitting a Pull Request, ensure you have completed these steps. Pull Requests failing CI will not be reviewed. | ||
|
|
||
| 1. [ ] Read `ARCHITECTURE.md` to ensure your change fits the architectural direction. | ||
| 2. [ ] `ruff check .` — No lint errors. | ||
| 3. [ ] `ruff format .` — Code is formatted. | ||
| 4. [ ] `mypy smp/` — Zero type errors. | ||
| 5. [ ] `pytest` — All tests pass, including integration tests spanning Neo4j and ChromaDB. | ||
|
|
||
| SMP implements the **Model Context Protocol (MCP)**. This allows it to serve as a "Codebase Memory Server" for any MCP-compatible client. Instead of the agent reading files blindly, it calls SMP tools to: | ||
| 1. `locate`: Find the right starting point in a massive repo. | ||
| 2. `get_context`: Get a structural summary of a file and its dependencies. | ||
| 3. `assess_impact`: Find all nodes affected by a potential change. | ||
| For detailed agent-specific interactions and JSON-RPC payloads, refer to `PROTOCOL.md` spec. | ||
| ``` No newline at end of file |
| Here is an expanded, standalone, and highly detailed `CONTRIBUTING.md` file. It goes deeper into the specific developer workflows, testing graph databases, adding protocol methods, and code standards required for the Structural Memory Protocol (SMP). | ||
|
|
||
| Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines. | ||
| *** | ||
|
|
||
| ## 🛠 Development Environment | ||
| # Contributing to the Structural Memory Protocol (SMP) |
There was a problem hiding this comment.
Please remove the conversational preamble at the beginning of the file. Documentation should start directly with the title.
| Here is an expanded, standalone, and highly detailed `CONTRIBUTING.md` file. It goes deeper into the specific developer workflows, testing graph databases, adding protocol methods, and code standards required for the Structural Memory Protocol (SMP). | |
| Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines. | |
| *** | |
| ## 🛠 Development Environment | |
| # Contributing to the Structural Memory Protocol (SMP) | |
| # Contributing to the Structural Memory Protocol (SMP) |
| * `proposed_content` (string). | ||
| * **Result:** `nodes_added`, `nodes_removed`, `nodes_modified`, and relationship deltas. | ||
|
|
||
| ### `smp/graph/why` |
| ### `smp/linker/report` | ||
| Lists all unresolved static edges (e.g., ambiguous calls where the target function exists in multiple files but wasn't explicitly imported). | ||
|
|
||
| * **Params:** | ||
| * `scope` (string): `"full"`, `"package:<path>"`, or `"file:<path>"`. | ||
| * **Result:** Array of `unresolved` edge definitions indicating caller and candidates. | ||
|
|
||
| ### `smp/linker/runtime` | ||
| Retrieves all `CALLS_RUNTIME` edges for a node (captured via eBPF trace execution). | ||
|
|
||
| * **Params:** | ||
| * `node_id` (string): Target node ID. | ||
| * `commit_sha` (string): Specific commit hash. | ||
| * **Result:** Arrays of `runtime_callees` and `static_only_callees`. | ||
|
|
There was a problem hiding this comment.
The methods smp/linker/report and smp/linker/runtime are documented here but are not registered in the RpcDispatcher (see smp/protocol/dispatcher.py). Conversely, several registered methods such as smp/reindex, smp/session/recover, smp/lock, and smp/unlock are missing from this API reference. Please ensure the documentation accurately reflects the implemented and registered JSON-RPC methods.
| │ │ ├── community.py # smp/community/detect, list, get, boundaries | ||
| │ │ ├── query.py # smp/navigate, trace, context, impact, locate, flow, diff, why | ||
| │ │ ├── enrichment.py # smp/enrich, annotate, tag, search | ||
| │ │ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, lock, audit | ||
| │ │ ├── planning.py # smp/plan, conflict | ||
| │ │ ├── sandbox.py # smp/sandbox/spawn, execute, destroy | ||
| │ │ ├── verify.py # smp/verify/integrity | ||
| │ │ ├── handoff.py # smp/handoff/review, pr | ||
| │ │ └── telemetry.py # smp/telemetry/* |
There was a problem hiding this comment.
The project structure described here is inaccurate compared to the actual file organization and handler registration in smp/protocol/dispatcher.py. For example, smp/plan and smp/conflict are in query_ext.py, not planning.py (which does not exist), and smp/annotate is in annotation.py, not enrichment.py.
| │ │ ├── community.py # smp/community/detect, list, get, boundaries | |
| │ │ ├── query.py # smp/navigate, trace, context, impact, locate, flow, diff, why | |
| │ │ ├── enrichment.py # smp/enrich, annotate, tag, search | |
| │ │ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, lock, audit | |
| │ │ ├── planning.py # smp/plan, conflict | |
| │ │ ├── sandbox.py # smp/sandbox/spawn, execute, destroy | |
| │ │ ├── verify.py # smp/verify/integrity | |
| │ │ ├── handoff.py # smp/handoff/review, pr | |
| │ │ └── telemetry.py # smp/telemetry/* | |
| │ ├── memory.py # smp/update, batch_update, reindex | |
| │ ├── merkle.py # smp/sync, merkle/tree, index/* | |
| │ ├── query.py # smp/navigate, trace, context, impact, locate, search, flow | |
| │ ├── query_ext.py # smp/diff, plan, conflict, why | |
| │ ├── enrichment.py # smp/enrich/* | |
| │ ├── annotation.py # smp/annotate/*, tag | |
| │ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, rollback, lock, unlock, audit, verify/integrity | |
| │ ├── sandbox.py # smp/sandbox/* | |
| │ ├── handoff.py # smp/handoff/* | |
| │ └── telemetry.py # smp/telemetry/* |
No description provided.