-
Notifications
You must be signed in to change notification settings - Fork 0
Memory System
Sparks maintains persistent cross-session memory using a local embedding pipeline, an HNSW approximate nearest-neighbor index, and SQLite full-text search. No external embedding API is required — all vector generation happens locally via an ONNX model.
Task / conversation
→ Generate embedding (ONNX, local)
→ Deduplicate check (cosine threshold)
→ Store in SQLite + update HNSW index
→ At next task: retrieve top-K by similarity + recency weight
→ Inject into system prompt context
| Property | Value |
|---|---|
| Model | all-MiniLM-L6-v2 |
| Format | ONNX |
| Vector dimensions | 384 |
| Location | ~/.sparks/models/all-MiniLM-L6-v2/ |
| External API required | No |
Verify the model is present:
cargo run --quiet -- doctor --skip-llm | grep -i embedIf the directory is missing, memory and semantic search fail silently. The doctor command will flag this.
Sparks uses two complementary retrieval methods:
- Approximate nearest-neighbor search over the 384-dim embedding space
- Scales well as memory grows
- Falls back to exact cosine search for small/early datasets (below a threshold)
- Fast keyword-based retrieval
- Useful for exact terms, identifiers, and structured content that embeddings may handle poorly
At retrieval time, results from both methods are merged and re-ranked.
Older memories lose relevance weight over time. The decay follows an exponential half-life:
weight = cosine_similarity × 2^(-age_days / half_life_days)
Default half_life_days = 30. This means a memory from 30 days ago has half the effective weight of an otherwise identical memory from today.
Configure in config.toml:
[memory]
half_life_days = 30Before storing a new memory, Sparks checks cosine similarity against recent memories. If similarity exceeds dedup_threshold, the new memory is discarded as a duplicate.
[memory]
dedup_threshold = 0.92 # 0.0–1.0; higher = less aggressive deduplicationAt task start, Sparks retrieves the top-K most relevant memories and injects them into the ghost's system prompt. The number is configurable:
[memory]
max_results = 10Memory injection counts against the task's context budget (default 128K tokens). If the context is tight, reduce max_results or trim soul file size.
[memory]
enabled = true
half_life_days = 30
dedup_threshold = 0.92
max_results = 10
[embedding]
enabled = true
model_dir = "~/.sparks/models/all-MiniLM-L6-v2"All memories are persisted in SQLite (~/.sparks/sparks.db):
-
memoriestable: text content, embedding vector (BLOB), timestamp, source - HNSW index: rebuilt/updated incrementally on writes; exact search fallback below threshold
- FTS5 virtual table: mirrors memory text for keyword search
Memory survives restarts. There is currently no manual memory management CLI — memories accumulate indefinitely, filtered by recency weight.
# Check embedding model and memory pipeline health
cargo run --quiet -- doctor --skip-llm
# Check memory count (via observer)
cargo run --quiet -- observeThe doctor command checks:
- Embedding model directory exists and is readable
- SQLite DB is writable
- HNSW index initializes without error
-
src/memory.rs— MemoryStore, HNSW index, FTS5, deduplication -
src/embeddings.rs— ONNX runtime, vector generation -
docs/memory-hot-path-lru.md— hot path design and benchmark contract