-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
Explore implementing a JSON-backed repository as an alternative to Neo4j, enabling pip install vre and a working demo with zero infrastructure. This is a proposal for discussion — not a committed direction.
Problem Statement
VRE currently requires a running Neo4j instance for any usage. This creates a significant barrier to entry:
- New users must install and configure Neo4j before they can experiment with VRE
- Simple demos, tests, and prototyping require database infrastructure
- CI/CD pipelines need Neo4j services or containers
- The "try VRE in 5 minutes" experience doesn't exist
Neo4j is the right choice for production workloads — native graph traversal, relationship indexing, and Cypher queries are genuinely valuable for complex epistemic graphs. But requiring it from day one makes VRE feel heavier than it needs to be for exploration and adoption.
Prior Art
The test suite already uses a stub repository (StubRepository) that implements the repository interface in-memory. This stub is the direct inspiration for this proposal — it demonstrates that VRE's core engine (grounding, policy, learning) works correctly without Neo4j. The question is whether to formalize and extend that pattern into a persistent, user-facing backend.
Proposed Idea
Note: This is an idea being explored, not a definitive architectural direction.
The proposal has three parts:
1. Repository Protocol/ABC
Extract the implicit repository interface from PrimitiveRepository into an explicit Repository protocol or ABC:
class Repository(ABC):
@abstractmethod
def get(self, concept: str) -> Primitive | None: ...
@abstractmethod
def save(self, primitive: Primitive) -> None: ...
@abstractmethod
def get_related(self, primitive_id: str, relation_type: RelationType) -> list[Relatum]: ...
# ... etcThis formalizes what VRE expects from its storage layer and makes backend-swappability a first-class property.
2. Refactor PrimitiveRepository
Refactor PrimitiveRepository (the Neo4j implementation) to implement the new Repository ABC. Existing behavior is unchanged — this is a structural refactor, not a behavioral one.
3. JsonRepository
Build a JsonRepository that implements Repository backed by an in-memory dict persisted to a JSON file:
- On init, load the JSON file into memory (or start empty)
- All reads are dict lookups — fast, no network
- Writes update the in-memory dict and flush to disk
- BFS-based subgraph resolution: The Neo4j repository uses Cypher's native graph traversal to resolve subgraphs during grounding. The JSON repository will need to implement BFS over its in-memory dict structure to replicate this — walking relata from anchor nodes to build the same resolved subgraph that Neo4j returns natively. This is the most significant implementation challenge.
This makes the following possible:
pip install vre
python -c "from vre import VRE, JsonRepository; vre = VRE(JsonRepository('my_graph.json')); ..."Neo4j becomes the production upgrade path, not the entry requirement.
VRE Design Alignment
- Minimal footprint: CLAUDE.md Section 8 emphasizes minimal dependencies. A JSON backend removes the Neo4j requirement for getting started.
- VRE contract preserved: The agent–VRE contract (Section 5) is with VRE, not with Neo4j. Swapping the storage backend does not change the epistemic guarantees — grounding, depth gating, and policy evaluation work identically.
- Inspectability: A JSON file is arguably more inspectable than a Neo4j database — users can open it in any text editor.
- Technology stack: Section 8.2 specifies Neo4j for its graph properties. A JSON backend would not replace Neo4j for production use — it complements it as an on-ramp.
Design Considerations
- BFS correctness: The BFS implementation must produce identical subgraphs to Neo4j's Cypher traversal. The existing test suite (which uses
StubRepository) provides the correctness baseline — if all tests pass withJsonRepository, the traversal is equivalent. - Performance: JSON repository will not scale to large graphs. This is acceptable — it's for exploration, demos, and small projects. The docs should be clear about when to graduate to Neo4j.
- Graph traversal: Without native graph indexing, BFS traversal will be O(n). Acceptable for small graphs, but the performance cliff should be documented.
- Concurrency: JSON file writes are not concurrent-safe. Single-agent usage only, or add file locking.
- Schema evolution: JSON files need a version field so future VRE versions can migrate old graph files.
- What about SQLite?: SQLite is another zero-infrastructure option with better query capabilities. Worth considering, but JSON is simpler and more inspectable for a first pass.
Open Questions
- Is this the right abstraction boundary? Should the Repository ABC mirror the current
PrimitiveRepositoryinterface exactly, or should it be redesigned? - Should the JSON file format match the Neo4j schema (node/relationship structure) or use a more natural JSON representation?
- How should seed scripts work across backends — same scripts with backend-agnostic calls, or separate seed formats?
- Should this be a separate package (
vre-json) or included in the corevrepackage? - Is there interest in other lightweight backends (SQLite, DuckDB) beyond JSON?
- Can the existing
StubRepositoryfrom tests be promoted directly, or does it need significant rework for persistence and BFS?
Dependencies
None — this can be built independently, though it would benefit from a clean Repository interface extracted first.