Skip to content

feat(gardener/classifier): expand digest budget and clean noise; cover leaves + soft_links #343

@serenakeyitan

Description

@serenakeyitan

TL;DR

The tree-digest builder (src/products/gardener/engine/classifiers/tree-digest.ts) is operating with a budget calibrated for an older context window and leaves several easy quality wins on the table. Three concrete problems found while inspecting a real gardener-comment run against paperclipai/paperclip:

  1. Digest budget is 100KB, silently truncating — tiny fraction of the 200K-token window Claude 4.5/4.6/4.7 actually give us.
  2. Noise pollutes the digest.gardener-tree-cache/ auto-generated copies and drift/ placeholder NODE.md files consume budget for zero signal.
  3. Digest ignores leaf files and soft_links — classifier judges PRs against NODE.md one-liners only, misses the actual decisions.

Separately: there's no task-aware model selection, and users have no idea GARDENER_CLASSIFIER_MODEL exists.

Repro / evidence

Real run: first-tree gardener comment --pr 4368 --repo paperclipai/paperclip --tree-path ~/paperclip-tree, instrumented to dump the full prompt. Stats:

section bytes
system prompt 1,469
tree digest 24,794 (138 NODE.md entries, ~half redundant .gardener-tree-cache + drift placeholders)
PR body 8,023
diff 100,719 (under 200KB cap)
total prompt ~135KB

138 nodes used 25KB of a 100KB budget, but the useful signal was <50% of those nodes. The real paperclip-tree only has ~80 real NODE.md files; the rest were .gardener-tree-cache/ cache copies (entries like ````.gardener-tree-cache/adapters/claude-local/NODE.md````) and drift/paperclip-e392f6b1/.../NODE.md stubs labeled "Auto-generated intermediate node for sync proposals".

I also verified the model claim separately: I ran each of claude-haiku-4-5 / sonnet-4-5 / sonnet-4-6 / opus-4-6 / opus-4-7 through claude -p and asked it to report its own context size. All report 200K tokens. So no model change buys extra budget — the win is purely from better use of the existing window.

Proposals

1. Grow DIGEST_BUDGET_BYTES and make it model-aware

Today: hard-coded 100KB. Suggested:

  • Default to 500KB, enough to cover realistic trees (up to ~2700 NODE.md) with leaves and soft_links.
  • Or derive from the selected model: digestCap = contextSize(model) / 4 (leaves room for PR body + diff + output).
  • Surface an stderr warning when the cap is reached. Today: silent drop (confusing when a node is mysteriously missing from the citations).

2. Extend SKIP_DIRS and filter auto-generated drift

  • Add .gardener-tree-cache to SKIP_DIRS.
  • Skip any NODE.md whose extracted summary is exactly "Auto-generated intermediate node for sync proposals" (these are gardener's own scaffolding from drift/<source-id>/.../NODE.md, not real decisions).

On the run above this would drop ~half the 138 entries and return the digest to signal-dense state.

3. Include leaves + soft_links in the digest

The richest signal is in leaf files (e.g. product/task-system/issue-blockers/issue-graph-liveness.md), not in the parent NODE.md section summary. Today the classifier never sees leaves, so it's judging a PR that touches a specific decision against a parent's one-line description.

Suggested heuristic: when a PR diff touches files whose paths overlap a NODE.md's domain path (e.g. PR touches server/src/issues/... → include product/task-system/** leaves), include those leaf files in the digest. Path-prefix match is cheap and local — no embedding needed.

Same argument for soft_links: if product/task-system/issue-links/NODE.md has soft_links: [product/agent-model/NODE.md], include the linked node's summary too. Classifier can see the cross-domain relationship.

4. Task-aware default model

Today DEFAULT_MODEL = "claude-haiku-4-5" across every classifier call. For gardener comment this is fine — high call rate, classification-only, haiku handles it. But for gardener sync --open-issues and gardener draft-node, the LLM is generating tree-node bodies or reasoning about cross-domain drift; haiku undersells.

Side-by-side verified on live paperclip PRs:

  • PR #4368 (new adapter, should trigger "aligned with existing adapter pattern"), haiku-4-5: ALIGNED / low, zero cited nodes. Correct but shallow.
  • PR #4367 (queue-sweep governance change), sonnet-4-6 via GARDENER_CLASSIFIER_MODEL=claude-sonnet-4-6: NEEDS_REVIEW / medium, 4 cited nodes, flagged that the PR bundled an unrelated openclaw-gateway session-key change + pointed at UNHEALTHY_AGENT_STATUSES vs the canonical agent state machine.

Sonnet catches cross-domain signals haiku misses. Suggest:

  • gardener comment default: haiku-4-5 (cheap, frequent).
  • gardener sync / gardener draft-node default: sonnet-4-6 (quality-sensitive, rare).
  • Both remain overridable via GARDENER_CLASSIFIER_MODEL.

5. Document GARDENER_CLASSIFIER_MODEL in onboarding

It exists in the code (select.ts:50) and in install-workflow.ts workflow comments, but it's not in skills/first-tree/references/onboarding.md. Users don't know they can upgrade the model for their tree without code changes. Suggest a one-line mention in Step 6 or the Pitfalls section.

Acceptance criteria

  • DIGEST_BUDGET_BYTES default raised (≥ 500KB) and/or derived from model context size.
  • Budget-exhaustion emits an stderr warning with node count dropped.
  • .gardener-tree-cache in SKIP_DIRS; drift/ placeholder auto-generated nodes filtered out.
  • Digest optionally includes leaf .md files when diff paths overlap node domain.
  • Digest optionally includes soft-linked nodes' summaries when cited via soft_links frontmatter.
  • Default model is task-specific (haiku for comment, sonnet-4-6 for sync/draft-node) unless overridden.
  • GARDENER_CLASSIFIER_MODEL documented in onboarding.md.
  • Existing tests still pass; new tests for each heuristic (noise filter, leaf inclusion, soft_link inclusion, budget warning).

E2E test requirement

Before merging, verify:

  1. On paperclipai/paperclip, gardener comment --pr <open PR> runs end-to-end with the new digest builder; emitted comment cites real (non-hallucinated) paths.
  2. The .gardener-tree-cache / drift placeholders are absent from the posted citation list.
  3. Budget warning prints to stderr when cap is hit (can repro by lowering the cap in a test env).
  4. GARDENER_CLASSIFIER_MODEL=claude-sonnet-4-6 gardener comment ... runs with sonnet; matches the observed behavior on PR #4367 above.

Env / context

/cc @serenakeyitan

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions