feat(bonsai): compact-output mode + persistent KV cache by KailasMahavarkar · Pull Request #169 · orkait/graphstore

KailasMahavarkar · 2026-04-20T12:39:13Z

Two generic speed levers on top of BonsaiIngestor

Skipped -march=native rebuild - that's machine-specific and would ship binaries that crash on CPUs missing the ISA. Kept both levers CPU-agnostic.

1. Compact-output mode (`compact=True`)

LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS tags instead of ~150 tokens of full DSL. Python synthesizes DSL deterministically.

Wins (4B TQ1_0, CPU only):

metric	full DSL	compact	delta
warm avg	3.9s	1.7s	2.3x faster
cold #1	19.6s	10.1s	1.9x faster
5-msg total	35.3s	16.9s	2.1x faster
raw output	335B	98B	3.4x smaller

Quality on T1-T5: all 5 messages ingested correctly, 1 correct belief (fact:favorite_color=green), zero spurious beliefs from third-person observations (skill has explicit "I/my/me only" rule + negative example).

2. Persistent KV cache (`kv_cache_path=...`)

llama.cpp's save/load_state pickled to disk with a safety meta-dict (skill fingerprint + model size + n_ctx + chat_format). Stale/corrupt cache silently falls back to fresh warm.

Wins for cold-boot scenarios (serverless, CLI, dev iteration):

First ingest in fresh process: 2.0s (was 10.1s cold)
~8s eliminated per process boot

Disk cost: ~406 MB for n_ctx=2048. Opt-in.

API

ing = BonsaiIngestor(
    model_path=".../Ternary-Bonsai-4B-TQ1_0.gguf",
    compact=True,
    kv_cache_path="/tmp/bonsai_kv.bin",
)
ing.warmup()
ing.save_kv_cache()      # after warmup
r = ing.ingest("text", msg_id="m:s1:0", session_id="s1")

compact=False keeps the full-DSL path unchanged (backward compat).

Tests

+23 unit tests (48 total on bonsai_ingestor): compact parser, dsl synthesis, KV cache load/save/stale/corrupt. Full suite: 1850 passed, 101 skipped.

🤖 Generated with Claude Code

…-agnostic) Two CPU-agnostic speed levers on top of BonsaiIngestor. Both produce measurable wall-clock wins without any machine-specific tuning. 1. Compact-output mode (compact=True) LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS instead of ~150 tokens of full DSL. Python synthesizes the DSL deterministically. - New skill: tools/skills/graphstore-bonsai-dsl-compact/SKILL.md (~620 source tokens, clear rules + negative examples so model does NOT promote third-person observations to first-person beliefs). - Parser: _parse_compact_output(cleaned) -> CompactTurn - Templates: _synthesize_dsl(turn, msg_id=..., session_id=..., role=..., text=...) Deterministic DSL builder. Same input always produces same output. Wins (4B TQ1_0, CPU only): warm avg: 3.9s -> 1.7s (2.3x faster) cold: 19.6s -> 10.1s (1.9x faster) 5-msg: 35.3s -> 16.9s (2.1x faster) raw out: 335B -> 98B (3.4x smaller) Quality on T1-T5 smoke: all 5 messages ingested, 1 correct belief (fact:favorite_color=green), zero spurious beliefs, fact_id reuse working across the correction test. API shape: ing = BonsaiIngestor(model_path=..., compact=True, kv_cache_path=...) ing.ingest(text, msg_id="m:s1:0", session_id="s1") # msg_id required # compact=False keeps the full-DSL path unchanged (backward compat) 2. Persistent KV cache (kv_cache_path=...) llama.cpp's save_state/load_state pickled to disk. Eliminates the ~10s cold penalty on every process restart (serverless, CLI one-shots, dev iteration). Workflow: run 1: ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin") ing.warmup() ing.save_kv_cache() run 2: ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin") ing.ingest("...") # <- 2.0s instead of ~10s cold Safety: the cache file stores a meta dict alongside the state - skill fingerprint, model path+size, n_ctx, chat_format. On load, stale meta (different skill, different model, different context size) is rejected and the process warms fresh. Corrupt pickle or wrong-shape payloads also silently fall back to fresh warm. Disk cost: ~406 MB for n_ctx=2048 on 4B TQ1_0. Opt-in; no-op when kv_cache_path is not set. Skipped: `-march=native` llama.cpp rebuild. That would only optimize this machine and could ship binaries that crash on CPUs without the same ISA. Kept portable instead. Tests: +23 unit tests (48 total on bonsai_ingestor) - compact parser: all-sections / none / missing / case-insensitive / fence tolerance / escaped quotes / unknown-prefix ignore - dsl synthesis: min-turn / entities + matching edges / dedupe / belief + retract pair / quote escaping / overall kind order - KV cache: no-op without path / no-op without llm / missing file / stale meta rejection / corrupt pickle / wrong shape / meta shape Full suite: 1850 passed, 101 skipped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

KailasMahavarkar merged commit 480e4fd into main Apr 20, 2026
4 checks passed

KailasMahavarkar deleted the feat/bonsai-compact-output branch April 20, 2026 12:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(bonsai): compact-output mode + persistent KV cache#169

feat(bonsai): compact-output mode + persistent KV cache#169
KailasMahavarkar merged 1 commit intomainfrom
feat/bonsai-compact-output

KailasMahavarkar commented Apr 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

KailasMahavarkar commented Apr 20, 2026

Two generic speed levers on top of BonsaiIngestor

1. Compact-output mode (compact=True)

2. Persistent KV cache (kv_cache_path=...)

API

Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

1. Compact-output mode (`compact=True`)

2. Persistent KV cache (`kv_cache_path=...`)