Skip to content

Proposal: JSON-backed repository for zero-infrastructure VRE #38

@anormang1992

Description

@anormang1992

Summary

Explore implementing a JSON-backed repository as an alternative to Neo4j, enabling pip install vre and a working demo with zero infrastructure. This is a proposal for discussion — not a committed direction.

Problem Statement

VRE currently requires a running Neo4j instance for any usage. This creates a significant barrier to entry:

  • New users must install and configure Neo4j before they can experiment with VRE
  • Simple demos, tests, and prototyping require database infrastructure
  • CI/CD pipelines need Neo4j services or containers
  • The "try VRE in 5 minutes" experience doesn't exist

Neo4j is the right choice for production workloads — native graph traversal, relationship indexing, and Cypher queries are genuinely valuable for complex epistemic graphs. But requiring it from day one makes VRE feel heavier than it needs to be for exploration and adoption.

Prior Art

The test suite already uses a stub repository (StubRepository) that implements the repository interface in-memory. This stub is the direct inspiration for this proposal — it demonstrates that VRE's core engine (grounding, policy, learning) works correctly without Neo4j. The question is whether to formalize and extend that pattern into a persistent, user-facing backend.

Proposed Idea

Note: This is an idea being explored, not a definitive architectural direction.

The proposal has three parts:

1. Repository Protocol/ABC

Extract the implicit repository interface from PrimitiveRepository into an explicit Repository protocol or ABC:

class Repository(ABC):
    @abstractmethod
    def get(self, concept: str) -> Primitive | None: ...
    @abstractmethod
    def save(self, primitive: Primitive) -> None: ...
    @abstractmethod
    def get_related(self, primitive_id: str, relation_type: RelationType) -> list[Relatum]: ...
    # ... etc

This formalizes what VRE expects from its storage layer and makes backend-swappability a first-class property.

2. Refactor PrimitiveRepository

Refactor PrimitiveRepository (the Neo4j implementation) to implement the new Repository ABC. Existing behavior is unchanged — this is a structural refactor, not a behavioral one.

3. JsonRepository

Build a JsonRepository that implements Repository backed by an in-memory dict persisted to a JSON file:

  • On init, load the JSON file into memory (or start empty)
  • All reads are dict lookups — fast, no network
  • Writes update the in-memory dict and flush to disk
  • BFS-based subgraph resolution: The Neo4j repository uses Cypher's native graph traversal to resolve subgraphs during grounding. The JSON repository will need to implement BFS over its in-memory dict structure to replicate this — walking relata from anchor nodes to build the same resolved subgraph that Neo4j returns natively. This is the most significant implementation challenge.

This makes the following possible:

pip install vre
python -c "from vre import VRE, JsonRepository; vre = VRE(JsonRepository('my_graph.json')); ..."

Neo4j becomes the production upgrade path, not the entry requirement.

VRE Design Alignment

  • Minimal footprint: CLAUDE.md Section 8 emphasizes minimal dependencies. A JSON backend removes the Neo4j requirement for getting started.
  • VRE contract preserved: The agent–VRE contract (Section 5) is with VRE, not with Neo4j. Swapping the storage backend does not change the epistemic guarantees — grounding, depth gating, and policy evaluation work identically.
  • Inspectability: A JSON file is arguably more inspectable than a Neo4j database — users can open it in any text editor.
  • Technology stack: Section 8.2 specifies Neo4j for its graph properties. A JSON backend would not replace Neo4j for production use — it complements it as an on-ramp.

Design Considerations

  • BFS correctness: The BFS implementation must produce identical subgraphs to Neo4j's Cypher traversal. The existing test suite (which uses StubRepository) provides the correctness baseline — if all tests pass with JsonRepository, the traversal is equivalent.
  • Performance: JSON repository will not scale to large graphs. This is acceptable — it's for exploration, demos, and small projects. The docs should be clear about when to graduate to Neo4j.
  • Graph traversal: Without native graph indexing, BFS traversal will be O(n). Acceptable for small graphs, but the performance cliff should be documented.
  • Concurrency: JSON file writes are not concurrent-safe. Single-agent usage only, or add file locking.
  • Schema evolution: JSON files need a version field so future VRE versions can migrate old graph files.
  • What about SQLite?: SQLite is another zero-infrastructure option with better query capabilities. Worth considering, but JSON is simpler and more inspectable for a first pass.

Open Questions

  • Is this the right abstraction boundary? Should the Repository ABC mirror the current PrimitiveRepository interface exactly, or should it be redesigned?
  • Should the JSON file format match the Neo4j schema (node/relationship structure) or use a more natural JSON representation?
  • How should seed scripts work across backends — same scripts with backend-agnostic calls, or separate seed formats?
  • Should this be a separate package (vre-json) or included in the core vre package?
  • Is there interest in other lightweight backends (SQLite, DuckDB) beyond JSON?
  • Can the existing StubRepository from tests be promoted directly, or does it need significant rework for persistence and BFS?

Dependencies

None — this can be built independently, though it would benefit from a clean Repository interface extracted first.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions