refactor(bonsai): English-verb DSL grammar + lite variant + persistent KV cache#170
Open
KailasMahavarkar wants to merge 3 commits intomainfrom
Open
refactor(bonsai): English-verb DSL grammar + lite variant + persistent KV cache#170KailasMahavarkar wants to merge 3 commits intomainfrom
KailasMahavarkar wants to merge 3 commits intomainfrom
Conversation
Before: U/F/D compact verbs for the 3 ingest paths, with an `!` escape hatch to pass any other DSL through verbatim. Model had to remember two tokenizations and the escape never compressed. After: one positional verb table covering the whole common DSL surface - ingest (U/F/D), edges (E), retrieval (RM/SM/LX/AQ), walks (RL/TR/AN/SG), sys/vault (SS/SC/SH/ST/SX/VS). Python expands each to the full DSL line. Every path hits the same ~3-5x output-token reduction now, not just ingest. Changes: - Replace `_parse_compact_output` with a dispatch table of verb handlers built from small factories (_h_upsert, _h_fact, _h_drop, _h_edge, _h_query, _h_walk, _h_plain). - Swap `CompactTurn.raw_dsl` for `CompactTurn.statements`: pre-rendered DSL lines for every non-ingest verb. `_synthesize_dsl` appends them verbatim after the message node + mention wiring + fact updates. - SKILL.md rewritten to v5: 16 verbs documented, examples per common path, ~900 tokens. - Unit tests: drop `!`-escape block, add per-verb coverage for edges, all 4 retrieval verbs, all 4 walks, all 6 sys/vault ops, and the aliased long forms (REMEMBER/SIMILAR/RECALL/TRAVERSE). Test results: 78/78 bonsai unit tests pass; full suite 1880 pass, 101 skip (unchanged).
The NL->DSL ingestor now covers 100% of the grammar.lark NL-addressable surface (94 rules) via English-keyword @-verbs. Short-code abbreviations (@U/@F/@RM/etc.) are gone - every verb is a readable DSL keyword (@upsert, @belief, @Remember, @snapshot, @checkpoint, @CRON_ADD, @EVOLVE_RULE, ...). Full dispatch has ~100 entries including grammar aliases (ASSERT->BELIEF, FORGET_NODE->FORGET, etc.). Two prompt variants now ship with the package: - bonsai_dsl_prompt.txt: full 94-verb surface, ~1700 tokens, n_ctx=4096 - bonsai_dsl_prompt_lite.txt: 16-verb ingest+retrieval subset, ~800 tokens, n_ctx=2048. Fewer competing verbs means the model picks correctly on conversational turns. Load time 19s -> 8s. n_ctx auto-picks smallest power-of-two that fits the loaded prompt + typical user-msg budget + max_output + headroom. Callers can still pin n_ctx explicitly. Parser rewritten as a factory dispatch table: - _h_slug / _h_topic / _h_walk / _h_pair / _h_query / _h_plain / _h_raw - Special handlers for update_node, merge, increment, propagate, describe, unregister, contradictions, cron_add, optimize, clear, wal, nodes, vault_triplet, snapshot (auto-timestamp fallback). Compact/raw-DSL mode removed. Single prompt-driven mode. `compact` kwarg and `_DEFAULT_COMPACT_*` symbols deleted. `CompactTurn` renamed `ParsedTurn`, `_parse_compact_output` -> `_parse_verb_output`, `_COMPACT_HANDLERS` -> `_VERB_HANDLERS`. Grammar bugs fixed in this pass: - @snapshot without name auto-fills a UTC timestamp (SNAPSHOT STRING is required by grammar; bare @ss was emitting invalid DSL). - @compact rewritten to SYS OPTIMIZE COMPACT (SYS COMPACT isn't a real grammar rule; SYS OPTIMIZE COMPACT is). Prompt file moved out of tools/skills/ into src/graphstore/ so it ships with the wheel. pyproject package-data now includes *.txt. Performance envelope measured on AMD 9700X / DDR5-5200: - Cold load: 8s (lite) / 19s (full) - Cold load with persistent kv_cache_path: 0.4s (19x faster) - Peak decode: 27-30 tok/s (memory-bandwidth bound at ~810 MB weight read per token for 4B TQ1_0) - Per-call wall: 0.3-2s; overall 15-20 tok/s Tests: 89 unit tests pass; 107 synthesized DSL templates parse clean against grammar.lark (verify_v6_templates.py check).
@retract LongMemEval smoke revealed two real drift patterns in the lite prompt: 1. Spurious @retract on unrelated turns: with KNOWN FACTS present, the model was emitting @retract + @belief even when the new turn was about a different topic entirely. The correction-flow example in the prompt was the attractor: model pattern-matched any new fact to a correction. 2. @recall misfire on personal-fact questions: "Which city did I move to last year?" emitted @recall location (wrong verb, bare anchor). Model thought the belief topic "location" was a valid walk anchor. Prompt changes: - VERB PICK RULE now distinguishes personal-fact questions ("Where did I ...?", "Which city did I ...?") which route to @answer, from named-entity connection questions which route to @recall. - Added explicit rule: walk/path verbs (@recall, @traverse, @Ancestors, @descendants, @subgraph, @path, @SHORTEST_PATH, @common) REQUIRE a prefixed anchor id (ent:X / fact:X / msg:X). Bare topic names like "location" are not valid anchors. - Added explicit rule: @retract only fires on correction trigger words ("actually", "not anymore", "changed to", "now prefer", "instead"). Unrelated new turns must NOT emit @retract even if related beliefs are in KNOWN FACTS. - New NEGATIVE example showing KNOWN FACTS [fact:location]="Seattle" plus unrelated turn "I bought a new guitar" -> only @belief purchase guitar. No retract. - Two new @answer examples for personal-fact questions: "Which city did I move to last year?" and "What is my favorite color?". Verified by re-running tools/scripts style LongMemEval smoke on the fixture: both drift cases now produce correct ops (@BELIEF-only for unrelated turns, @answer for personal-fact questions). Tests: 89/89 pass, no unit-test deltas (pure prompt change).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
n_ctxpicks the smallest power-of-two that fits the loaded prompt + user-msg budget + output + headroom. Callers can still pin explicitly.kv_cache_path. Cold start goes 8s -> 0.4s (19x faster) across process restarts. Meta-guarded so skill / model / n_ctx changes invalidate the cache automatically.What changed
_h_slug / _h_topic / _h_walk / _h_pair / _h_query / _h_plain / _h_raw) plus targeted specials for complex verbs.compactdual-mode removed; prompt-driven is the only mode.CompactTurn->ParsedTurn,_parse_compact_output->_parse_verb_output,_COMPACT_HANDLERS->_VERB_HANDLERS,_DEFAULT_COMPACT_*symbols deleted.tools/skills/graphstore-bonsai-dsl-compact/intosrc/graphstore/so they ship with the wheel.pyproject.tomlpackage-data now includes*.txt.@SNAPSHOTwith no name auto-fills a UTC timestamp (grammar requiresSNAPSHOT STRING).@COMPACTnow emitsSYS OPTIMIZE COMPACT(bareSYS COMPACTisn't a real rule).Performance (AMD 9700X, DDR5-5200, 4B TQ1_0)
@-prefix parser gate makes English drift /<think>leaks / fences inert at parser level (no DSL corruption possible)Test plan
pytest tests/test_bonsai_ingestor.py)grammar.larkGenerated with Claude Code.