Skip to content

feat(bonsai): compact-output mode + persistent KV cache#169

Merged
KailasMahavarkar merged 1 commit intomainfrom
feat/bonsai-compact-output
Apr 20, 2026
Merged

feat(bonsai): compact-output mode + persistent KV cache#169
KailasMahavarkar merged 1 commit intomainfrom
feat/bonsai-compact-output

Conversation

@KailasMahavarkar
Copy link
Copy Markdown
Contributor

Two generic speed levers on top of BonsaiIngestor

Skipped -march=native rebuild - that's machine-specific and would ship binaries that crash on CPUs missing the ISA. Kept both levers CPU-agnostic.

1. Compact-output mode (compact=True)

LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS tags instead of ~150 tokens of full DSL. Python synthesizes DSL deterministically.

Wins (4B TQ1_0, CPU only):

metric full DSL compact delta
warm avg 3.9s 1.7s 2.3x faster
cold #1 19.6s 10.1s 1.9x faster
5-msg total 35.3s 16.9s 2.1x faster
raw output 335B 98B 3.4x smaller

Quality on T1-T5: all 5 messages ingested correctly, 1 correct belief (fact:favorite_color=green), zero spurious beliefs from third-person observations (skill has explicit "I/my/me only" rule + negative example).

2. Persistent KV cache (kv_cache_path=...)

llama.cpp's save/load_state pickled to disk with a safety meta-dict (skill fingerprint + model size + n_ctx + chat_format). Stale/corrupt cache silently falls back to fresh warm.

Wins for cold-boot scenarios (serverless, CLI, dev iteration):

  • First ingest in fresh process: 2.0s (was 10.1s cold)
  • ~8s eliminated per process boot

Disk cost: ~406 MB for n_ctx=2048. Opt-in.

API

ing = BonsaiIngestor(
    model_path=".../Ternary-Bonsai-4B-TQ1_0.gguf",
    compact=True,
    kv_cache_path="/tmp/bonsai_kv.bin",
)
ing.warmup()
ing.save_kv_cache()      # after warmup
r = ing.ingest("text", msg_id="m:s1:0", session_id="s1")

compact=False keeps the full-DSL path unchanged (backward compat).

Tests

+23 unit tests (48 total on bonsai_ingestor): compact parser, dsl synthesis, KV cache load/save/stale/corrupt. Full suite: 1850 passed, 101 skipped.

🤖 Generated with Claude Code

…-agnostic)

Two CPU-agnostic speed levers on top of BonsaiIngestor. Both produce
measurable wall-clock wins without any machine-specific tuning.

1. Compact-output mode (compact=True)
   LLM emits ~30 tokens of ENTS/BELIEFS/RETRACTS instead of ~150 tokens
   of full DSL. Python synthesizes the DSL deterministically.

   - New skill: tools/skills/graphstore-bonsai-dsl-compact/SKILL.md
     (~620 source tokens, clear rules + negative examples so model does
     NOT promote third-person observations to first-person beliefs).
   - Parser: _parse_compact_output(cleaned) -> CompactTurn
   - Templates: _synthesize_dsl(turn, msg_id=..., session_id=..., role=..., text=...)
     Deterministic DSL builder. Same input always produces same output.

   Wins (4B TQ1_0, CPU only):
     warm avg:  3.9s  ->  1.7s   (2.3x faster)
     cold:     19.6s  -> 10.1s   (1.9x faster)
     5-msg:    35.3s  -> 16.9s   (2.1x faster)
     raw out:  335B   ->  98B    (3.4x smaller)

   Quality on T1-T5 smoke: all 5 messages ingested, 1 correct belief
   (fact:favorite_color=green), zero spurious beliefs, fact_id reuse
   working across the correction test.

   API shape:
     ing = BonsaiIngestor(model_path=..., compact=True, kv_cache_path=...)
     ing.ingest(text, msg_id="m:s1:0", session_id="s1")   # msg_id required
     # compact=False keeps the full-DSL path unchanged (backward compat)

2. Persistent KV cache (kv_cache_path=...)
   llama.cpp's save_state/load_state pickled to disk. Eliminates the
   ~10s cold penalty on every process restart (serverless, CLI
   one-shots, dev iteration).

   Workflow:
     run 1:  ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin")
             ing.warmup()
             ing.save_kv_cache()
     run 2:  ing = BonsaiIngestor(..., kv_cache_path="/tmp/bonsai_kv.bin")
             ing.ingest("...")    # <- 2.0s instead of ~10s cold

   Safety: the cache file stores a meta dict alongside the state -
   skill fingerprint, model path+size, n_ctx, chat_format. On load,
   stale meta (different skill, different model, different context
   size) is rejected and the process warms fresh. Corrupt pickle or
   wrong-shape payloads also silently fall back to fresh warm.

   Disk cost: ~406 MB for n_ctx=2048 on 4B TQ1_0. Opt-in; no-op when
   kv_cache_path is not set.

Skipped: `-march=native` llama.cpp rebuild. That would only optimize
this machine and could ship binaries that crash on CPUs without the
same ISA. Kept portable instead.

Tests: +23 unit tests (48 total on bonsai_ingestor)
  - compact parser: all-sections / none / missing / case-insensitive /
    fence tolerance / escaped quotes / unknown-prefix ignore
  - dsl synthesis: min-turn / entities + matching edges / dedupe /
    belief + retract pair / quote escaping / overall kind order
  - KV cache: no-op without path / no-op without llm / missing file /
    stale meta rejection / corrupt pickle / wrong shape / meta shape

Full suite: 1850 passed, 101 skipped.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@KailasMahavarkar KailasMahavarkar merged commit 480e4fd into main Apr 20, 2026
4 checks passed
@KailasMahavarkar KailasMahavarkar deleted the feat/bonsai-compact-output branch April 20, 2026 12:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant