Skip to content

doc work 2#27

Merged
offx-zinth merged 1 commit intomainfrom
master
Apr 19, 2026
Merged

doc work 2#27
offx-zinth merged 1 commit intomainfrom
master

Conversation

@offx-zinth
Copy link
Copy Markdown
Owner

No description provided.

@offx-zinth offx-zinth merged commit ec92d4e into main Apr 19, 2026
1 check failed
Copy link
Copy Markdown

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly expands and matures the documentation for the Structural Memory Protocol (SMP), providing a comprehensive API reference, architectural deep-dives, and detailed contribution guidelines. The updates transition the documentation from high-level summaries to production-ready specifications. Feedback focuses on refining the professional tone by removing LLM-generated conversational filler, eliminating redundant content between the architecture and contribution guides, and ensuring that method names and project structures accurately reflect the underlying implementation and registration logic.

Comment thread ARCHITECTURE.md
Comment on lines +1 to +5
Here are the detailed, production-ready `ARCHITECTURE.md` and `CONTRIBUTING.md` files based on the comprehensive Structural Memory Protocol (SMP) specifications.

### 1. `ARCHITECTURE.md`

```markdown
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This file contains LLM-generated conversational filler and nested markdown blocks. The preamble and the extra backticks should be removed to maintain a professional documentation style.

Suggested change
Here are the detailed, production-ready `ARCHITECTURE.md` and `CONTRIBUTING.md` files based on the comprehensive Structural Memory Protocol (SMP) specifications.
### 1. `ARCHITECTURE.md`
```markdown
# Architecture Guide: Structural Memory Protocol (SMP)

Comment thread ARCHITECTURE.md
Comment on lines +189 to +325
### 2. `CONTRIBUTING.md`

The `SeedWalkEngine` implements a 4-phase pipeline to find the most relevant code for a given query.
```markdown
# Contributing to SMP

### Phase 0: Route
The query embedding is compared against the **Level-1 Community Centroids** in ChromaDB. If the confidence exceeds a threshold, the search is scoped to that specific community (~200 nodes), drastically reducing noise.
Thank you for contributing to the Structural Memory Protocol (SMP)! To maintain the integrity, safety, and high performance of this agentic architecture, we enforce strict guidelines.

### Phase 1: Seed
A vector search is performed in ChromaDB to find the top-K "seed" nodes whose signatures or docstrings most closely match the query.
## 🛠 Development Environment

### Phase 2: Walk
From the seeds, the engine performs a multi-hop traversal in Neo4j, following `CALLS_STATIC`, `CALLS_RUNTIME`, and `IMPORTS` edges. This captures the structural context (who calls this? what does this call?).
### Python Version
SMP requires **Python 3.11** explicitly. We heavily utilize modern features like `X | Y` unions, `tomllib`, and performance optimizations not present in older versions.

### Phase 3: Rank
Nodes are ranked using a composite score:
$$\text{Score} = \alpha \cdot \text{VectorSimilarity} + \beta \cdot \text{NormalizedPageRank} + \gamma \cdot \text{HeatScore}$$
- **Vector Similarity:** Relevance to the query.
- **PageRank:** Structural importance in the graph.
- **Heat Score:** Frequency of execution (from telemetry/runtime traces).
### Setup Instructions
1. **Create a Virtual Environment:**
```bash
python3.11 -m venv .venv
source .venv/bin/activate
```
2. **Install Dependencies:**
```bash
pip install -e ".[dev]"
```
3. **Configure Environment:**
Copy `.env.example` to `.env` and configure your Neo4j and ChromaDB credentials. Note that Neo4j requires the GDS (Graph Data Science) plugin for Louvain and PageRank calculations.

---

### Phase 4: Assemble
The engine produces a ranked list of `RankedResult` objects and a `structural_map` (adjacency list) allowing the agent to visualize the call chain.
## 🏛️ Architecture TL;DR
Before contributing, review `ARCHITECTURE.md`. SMP uses a layered design:
- `core/`: AST parsing, Linking (Static + eBPF), Enrichment, and persistence mapping.
- `engine/`: Query resolution (`SeedWalkEngine`), structural aggregations, context generation.
- `sandbox/`: MicroVM/Docker isolation, eBPF telemetry capture, and Mutation Testing.
- `protocol/`: JSON-RPC 2.0 endpoints utilizing the Dispatcher pattern.

---

## 💾 Persistence Layer
## 📝 Coding Standards

SMP utilizes a dual-store strategy to balance speed and structure.
SMP is designed to be read by humans and navigated by AI agents. Predictability is paramount.

| Store | Technology | Role | Data Held |
| :--- | :--- | :--- | :--- |
| **Graph Store** | **Neo4j** | Structural Truth | Entities, Relationships, Communities, PageRank, Full-Text Index. |
| **Vector Store** | **ChromaDB** | Entry Point | Node Embeddings, Community Centroids. |
### Imports
- Every file must start with `from __future__ import annotations`.
- Group imports: `stdlib` $\rightarrow$ `third-party` $\rightarrow$ `local`, separated by blank lines.
- **Always use absolute imports** for local modules:
`from smp.core.linker import StaticLinker` (Never `from ..linker import StaticLinker`).

### Type Annotations & Data Models
- **Strict Typing:** All function signatures must have full type annotations. No implicit `Any`.
- **Modern Unions:** Use `X | Y` instead of `Optional[X]` or `Union[X, Y]`.
- **Built-in Generics:** Use `list[...]`, `dict[...]`, `set[...]` instead of the `typing` module equivalents.
- **Msgspec Structs:** All data flowing through the protocol and engine must be defined as `msgspec.Struct` classes with `frozen=True` to ensure zero-copy immutability and fast JSON serialization.

```python
import msgspec

class RankedResult(msgspec.Struct, frozen=True):
node_id: str
node_type: str
vector_score: float
pagerank: float
is_seed: bool = False
```

### Naming & Style
- **Classes:** `PascalCase`
- **Functions/Methods:** `snake_case`
- **Private Members:** Prefix with `_leading_underscore`.
- **Docstrings:** Use triple double-quotes, imperative mood, and Google style. Docstrings are heavily relied upon by the Graph RAG engine, so be descriptive.
- **Line Length:** Max 120 characters.

---

## 🔌 MCP Integration
## 🔌 Adding Protocol Methods (The Dispatcher)

We do not use massive `if/elif` routers. If you are adding a new JSON-RPC endpoint to SMP, implement it in the appropriate module under `smp/protocol/handlers/` and use the `@rpc_method` decorator.

```python
# smp/protocol/handlers/telemetry.py
from smp.protocol.dispatcher import rpc_method
from smp.core.models import ServerContext

@rpc_method("smp/telemetry/hot")
async def handle_telemetry_hot(params: dict, ctx: ServerContext) -> dict:
"""Returns nodes with high churn and high blast radius."""
window = params.get("window_days", 30)
return await ctx.engine.telemetry.get_hot_nodes(window)
```

---

## 🔄 Development Workflow

### Branching
- `feature/description` for new functionality.
- `fix/description` for bug fixes.
- `docs/description` for documentation updates.

### Linting & Formatting
We use **Ruff** to enforce formatting and linting rules.
```bash
# Check for lint errors
ruff check .

# Automatically format code
ruff format .
```

### Type Checking
We rely on strict type boundaries. Run **Mypy** before committing:
```bash
mypy smp/
```

### Testing
We use **pytest** combined with `pytest-asyncio` for all asynchronous graph engine tests.
```bash
# Run all tests
pytest

# Run a specific module
pytest tests/engine/test_seed_walk.py
```

---

## ✅ Pre-Commit Checklist

Before submitting a Pull Request, ensure you have completed these steps. Pull Requests failing CI will not be reviewed.

1. [ ] Read `ARCHITECTURE.md` to ensure your change fits the architectural direction.
2. [ ] `ruff check .` — No lint errors.
3. [ ] `ruff format .` — Code is formatted.
4. [ ] `mypy smp/` — Zero type errors.
5. [ ] `pytest` — All tests pass, including integration tests spanning Neo4j and ChromaDB.

SMP implements the **Model Context Protocol (MCP)**. This allows it to serve as a "Codebase Memory Server" for any MCP-compatible client. Instead of the agent reading files blindly, it calls SMP tools to:
1. `locate`: Find the right starting point in a massive repo.
2. `get_context`: Get a structural summary of a file and its dependencies.
3. `assess_impact`: Find all nodes affected by a potential change.
For detailed agent-specific interactions and JSON-RPC payloads, refer to `PROTOCOL.md` spec.
``` No newline at end of file
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This section redundantly includes the entire content of CONTRIBUTING.md within ARCHITECTURE.md. Documentation should be modular; please remove this redundant section and rely on the standalone CONTRIBUTING.md file.

Comment thread CONTRIBUTING.md
Comment on lines +1 to +5
Here is an expanded, standalone, and highly detailed `CONTRIBUTING.md` file. It goes deeper into the specific developer workflows, testing graph databases, adding protocol methods, and code standards required for the Structural Memory Protocol (SMP).

Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines.
***

## 🛠 Development Environment
# Contributing to the Structural Memory Protocol (SMP)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Please remove the conversational preamble at the beginning of the file. Documentation should start directly with the title.

Suggested change
Here is an expanded, standalone, and highly detailed `CONTRIBUTING.md` file. It goes deeper into the specific developer workflows, testing graph databases, adding protocol methods, and code standards required for the Structural Memory Protocol (SMP).
Thank you for contributing to the Structural Memory Protocol! To maintain high code quality and architectural consistency, please follow these guidelines.
***
## 🛠 Development Environment
# Contributing to the Structural Memory Protocol (SMP)
# Contributing to the Structural Memory Protocol (SMP)

Comment thread API.md
* `proposed_content` (string).
* **Result:** `nodes_added`, `nodes_removed`, `nodes_modified`, and relationship deltas.

### `smp/graph/why`
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Inconsistent method name. The implementation in RpcDispatcher and the description in README.md use smp/why, but it is documented here as smp/graph/why. Please use smp/why for consistency.

Comment thread API.md
Comment on lines +134 to +148
### `smp/linker/report`
Lists all unresolved static edges (e.g., ambiguous calls where the target function exists in multiple files but wasn't explicitly imported).

* **Params:**
* `scope` (string): `"full"`, `"package:<path>"`, or `"file:<path>"`.
* **Result:** Array of `unresolved` edge definitions indicating caller and candidates.

### `smp/linker/runtime`
Retrieves all `CALLS_RUNTIME` edges for a node (captured via eBPF trace execution).

* **Params:**
* `node_id` (string): Target node ID.
* `commit_sha` (string): Specific commit hash.
* **Result:** Arrays of `runtime_callees` and `static_only_callees`.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The methods smp/linker/report and smp/linker/runtime are documented here but are not registered in the RpcDispatcher (see smp/protocol/dispatcher.py). Conversely, several registered methods such as smp/reindex, smp/session/recover, smp/lock, and smp/unlock are missing from this API reference. Please ensure the documentation accurately reflects the implemented and registered JSON-RPC methods.

Comment thread README.md
Comment on lines +1140 to +1148
│ │ ├── community.py # smp/community/detect, list, get, boundaries
│ │ ├── query.py # smp/navigate, trace, context, impact, locate, flow, diff, why
│ │ ├── enrichment.py # smp/enrich, annotate, tag, search
│ │ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, lock, audit
│ │ ├── planning.py # smp/plan, conflict
│ │ ├── sandbox.py # smp/sandbox/spawn, execute, destroy
│ │ ├── verify.py # smp/verify/integrity
│ │ ├── handoff.py # smp/handoff/review, pr
│ │ └── telemetry.py # smp/telemetry/*
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The project structure described here is inaccurate compared to the actual file organization and handler registration in smp/protocol/dispatcher.py. For example, smp/plan and smp/conflict are in query_ext.py, not planning.py (which does not exist), and smp/annotate is in annotation.py, not enrichment.py.

Suggested change
│ │ ├── community.py # smp/community/detect, list, get, boundaries
│ │ ├── query.py # smp/navigate, trace, context, impact, locate, flow, diff, why
│ │ ├── enrichment.py # smp/enrich, annotate, tag, search
│ │ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, lock, audit
│ │ ├── planning.py # smp/plan, conflict
│ │ ├── sandbox.py # smp/sandbox/spawn, execute, destroy
│ │ ├── verify.py # smp/verify/integrity
│ │ ├── handoff.py # smp/handoff/review, pr
│ │ └── telemetry.py # smp/telemetry/*
│ ├── memory.py # smp/update, batch_update, reindex
│ ├── merkle.py # smp/sync, merkle/tree, index/*
│ ├── query.py # smp/navigate, trace, context, impact, locate, search, flow
│ ├── query_ext.py # smp/diff, plan, conflict, why
│ ├── enrichment.py # smp/enrich/*
│ ├── annotation.py # smp/annotate/*, tag
│ ├── safety.py # smp/session/*, guard/check, dryrun, checkpoint, rollback, lock, unlock, audit, verify/integrity
│ ├── sandbox.py # smp/sandbox/*
│ ├── handoff.py # smp/handoff/*
│ └── telemetry.py # smp/telemetry/*

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant