An embedded memory database for AI agents. Facts get written with confidence scores, expire, get contradicted, decay by recency. Retrieval fuses vector similarity, BM25, graph structure, and recency in one call. Everything goes through a typed DSL. Runs in-process, persists to SQLite.
Status: v0.3.0, alpha.
pip install graphstoreCore ships with model2vec as the default embedder. Swap for Jina v5, bge-*, EmbeddingGemma, or any ONNX / GGUF model via graphstore install-embedder. PDFs, images, audio, GPU, and the web UI are opt-in extras.
pip install 'graphstore[ingest]' # PDF / DOCX / HTML
pip install 'graphstore[vision]' # local VLM for images + scanned PDFs
pip install 'graphstore[audio]' # faster-whisper speech-to-text
pip install 'graphstore[playground]' # FastAPI web UI
pip install 'graphstore[gpu]' # onnxruntime-gpu, Linux x86_64, CUDA 12Full extras matrix: Installation.
from graphstore import GraphStore
g = GraphStore(path="./brain")
g.execute('CREATE NODE "mem:paris" kind = "memory" '
'DOCUMENT "Paris is the capital of France, famous for the Eiffel Tower."')
g.execute('CREATE NODE "mem:rome" kind = "memory" '
'DOCUMENT "Rome is the capital of Italy, home to the Colosseum."')
g.execute('CREATE EDGE "mem:paris" -> "mem:rome" kind = "both_european_capitals"')
g.execute('REMEMBER "European history" LIMIT 5') # hybrid fusion
g.execute('RECALL FROM "mem:paris" DEPTH 2 LIMIT 10') # graph walk
g.execute('LEXICAL SEARCH "Eiffel Tower" LIMIT 5') # BM25
g.execute('SIMILAR TO "capital city" LIMIT 5') # vector onlyDOCUMENT "text" populates the vector index, FTS5 index, and blob storage in one shot. Without it, a node is structured data only.
Three engines behind one DSL.
- Graph: columnar numpy arrays + scipy CSR edge matrices. Reserved columns
__event_at__,__confidence__,__retracted__,__source__are first-class. - Vector: usearch HNSW, cosine. Auto-embedding on
DOCUMENTorEMBED contentschemas. - Document: SQLite + FTS5 for BM25 and blobs. Single-owner advisory lock on the path.
The DSL is Lark LALR(1). Every write, read, INGEST, and SYS * goes through it.
Deep dive: Architecture · Edge matrix.
REMEMBER fuses four signals at retrieval time. SIMILAR, LEXICAL, RECALL each expose a single leg.
| Signal | Default weight | Source |
|---|---|---|
vec_signal |
0.52 | max sentence cosine over usearch ANN |
bm25_signal |
0.25 | SQLite FTS5 over doc_fts |
recency |
0.15 | exp(-age / half_life) from __event_at__ |
graph_signal |
0.08 | sum of entity degrees |
Weights are configurable via graphstore.json, GRAPHSTORE_DSL_* env vars, or constructor kwargs.
Every result returns per-signal scores on every node and a meta["signals"] block with the full pipeline state (fusion weights, per-stage candidate counts, reranker status):
r = g.execute('REMEMBER "Caroline counseling" LIMIT 1 WHERE kind = "message"')
n = r.data[0]
print(n["_remember_score"], n["_vector_sim"], n["_bm25_score"],
n["_recency_score"], n["_graph_score"], n["_co_bonus"],
n["_recall_boost"], n["_rank_stage"])
r.meta["signals"] # {fusion, recency, stages, reranker, nucleus, ...}Dry-run the pipeline without mutating recall counts:
g.execute('SYS EXPLAIN REMEMBER "Caroline counseling" LIMIT 3')
# kind="plan", candidates with per-signal scores, full meta["signals"]Deep dive: REMEMBER pipeline.
For a full retrieve + synthesize loop, wire a reader callable and use ANSWER:
def my_reader(prompt: str, max_tokens: int = 1000) -> str:
... # call any LLM (openai, litellm, local, ...)
g = GraphStore(path="./brain", reader=my_reader)
r = g.execute('ANSWER "What is the capital of France?" LIMIT 3')
r.data["answer"] # "Paris"
r.data["cited_slots"] # ["mem:paris", ...]
r.meta["signals"] # same telemetry as REMEMBERgraphstore ships no LLM dependency. The reader is a plain callable; bring your own. Named readers (GraphStore(readers={"fast": a, "careful": b})) enable A/B via ANSWER "q" USING "careful".
Every DSL verb has a typed function. Same grammar, IDE autocomplete, injection-safe.
from graphstore import q, F, Time
q.create_node("mem:paris", kind="memory",
document="Paris is the capital of France.").execute(g)
recent = F.gte("__event_at__", Time.now_minus(7, "d"))
q.nodes(where=F.eq("kind", "memory") & recent & ~F.eq("__retracted__", True))
q.batch(
q.var("x", q.create_node("n1", kind="memory", document="a")),
q.var("y", q.create_node("n2", kind="memory", document="b")),
q.create_edge("$x", "$y", kind="next"),
).execute(g)Full reference: Query builder.
LongMemEval-S, 500 records, Jina v5 Small 1024d, Kaggle T4 GPU, 2026-04-19. Public kernel: kaggle.com/code/superkaiii/graphstore-jina-v5-small.
| Overall | knowledge-update | single-session-assistant | single-session-user | multi-session | temporal | preference |
|---|---|---|---|---|---|---|
| 97.0% | 100.0% | 100.0% | 98.6% | 98.5% | 94.7% | 83.3% |
Query p50 46 ms / p95 76 ms. Retrieval-only, no LLM judge.
LoCoMo, 50Q sample, MiniMax M2.7 reader. Overall F1 0.357. Retrieval recall: top-10 80%, top-50 96%.
Full methodology: Benchmarks.
- Embedded, one writer per path. For multi-tenant, wrap in your own service.
- No SQL, no Cypher, no distributed cluster. Graph ops exist because agent memory is a graph.
- Fusion weights are hand-tuned. Reranking is opt-in, off by default.
git clone https://github.com/orkait/graphstore.git
cd graphstore
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev,ingest,vision,embedders-extra,playground]"
pytestDocs site under website/ (Docusaurus). Run locally:
cd website && bun install && bun run startAGPL-3.0. See LICENSE.