Skip to content

feat(explore): Bash intercept, semantic output, robust citation resol…#8

Merged
HeinrichvH merged 1 commit intomainfrom
feat/explore-goose-recipe
Apr 22, 2026
Merged

feat(explore): Bash intercept, semantic output, robust citation resol…#8
HeinrichvH merged 1 commit intomainfrom
feat/explore-goose-recipe

Conversation

@HeinrichvH
Copy link
Copy Markdown
Owner

Fold recipes/explore-goose/ into recipes/explore/ as the single, concrete explorer recipe; README documents the Goose+Devstral reference impl and how to swap backends.

Intercept leading Bash search commands (grep/egrep/fgrep/rg/find/fd/ag/ack, optionally preceded by env assignments) and route them to the explore worker. Piped filters like kubectl ... | grep foo are correctly skipped because the explorer can't reproduce the upstream command's output. ls is deliberately omitted — too often a one-shot sanity check where routing burns latency for no context win. Session log analysis showed this intercept addresses ~9% of all Bash calls, clustered into exploration episodes that collapse from ~3k tokens to one ~250-token subagent report when the semantic answer suffices.

Two mechanical robustness fixes in explore.py so the subagent stays useful across repo layouts and cwd drift:

  • resolve_cited_path: when repo_root / cited_path doesn't exist, suffix-match against git ls-files and accept only a unique hit. Handles the common case where the subagent drops a monorepo prefix (cites Core/Foo.cs instead of src/Core/Foo.cs). Ambiguous matches (e.g. multiple README.md) are rejected rather than silently picked.

  • derive_repo_root: absolute paths in the query are the authoritative scope signal. When Claude greps an absolute path outside cwd (e.g. another checkout from a different project dir), use that path's git toplevel — not the cwd-derived one. Fallback to cwd's toplevel only when the query has no absolute paths.

Trim the Claude-facing response to keep context lean:

  • _validation block moves to stderr — it's debug telemetry Claude can't act on; the operator still sees it in the transcript.
  • references whose citation already appears in findings are dropped (dedup); the references array is omitted entirely when it would be empty.
  • empty findings arrays are dropped.
  • json.dumps emits compact single-line output; saves ~15-20% whitespace.
  • resolved_path attached to findings when suffix-matching kicked in, so Claude can Read the real path without re-guessing the prefix.

Semantic prompt changes in goose-home/explore/recipe.yaml push the subagent from enumerating occurrences ("used in X, used in Y, used in Z") to interpreting them ("sentinel identity for platform-level operations when caller context is missing, used in messaging, AI proxy, user mgmt"). Grouping rule: one finding per semantic role, not one per citation. The references array is now truly optional — populate only when a related location is worth calling out.

…ution

Fold recipes/explore-goose/ into recipes/explore/ as the single, concrete
explorer recipe; README documents the Goose+Devstral reference impl and
how to swap backends.

Intercept leading Bash search commands (grep/egrep/fgrep/rg/find/fd/ag/ack,
optionally preceded by env assignments) and route them to the explore
worker. Piped filters like `kubectl ... | grep foo` are correctly skipped
because the explorer can't reproduce the upstream command's output. `ls`
is deliberately omitted — too often a one-shot sanity check where routing
burns latency for no context win. Session log analysis showed this
intercept addresses ~9% of all Bash calls, clustered into exploration
episodes that collapse from ~3k tokens to one ~250-token subagent report
when the semantic answer suffices.

Two mechanical robustness fixes in explore.py so the subagent stays
useful across repo layouts and cwd drift:

  - resolve_cited_path: when `repo_root / cited_path` doesn't exist,
    suffix-match against `git ls-files` and accept only a unique hit.
    Handles the common case where the subagent drops a monorepo prefix
    (cites `Core/Foo.cs` instead of `src/Core/Foo.cs`). Ambiguous
    matches (e.g. multiple `README.md`) are rejected rather than silently
    picked.

  - derive_repo_root: absolute paths in the query are the authoritative
    scope signal. When Claude greps an absolute path outside cwd (e.g.
    another checkout from a different project dir), use that path's git
    toplevel — not the cwd-derived one. Fallback to cwd's toplevel only
    when the query has no absolute paths.

Trim the Claude-facing response to keep context lean:

  - _validation block moves to stderr — it's debug telemetry Claude can't
    act on; the operator still sees it in the transcript.
  - references whose citation already appears in findings are dropped
    (dedup); the references array is omitted entirely when it would be
    empty.
  - empty findings arrays are dropped.
  - json.dumps emits compact single-line output; saves ~15-20% whitespace.
  - resolved_path attached to findings when suffix-matching kicked in, so
    Claude can Read the real path without re-guessing the prefix.

Semantic prompt changes in goose-home/explore/recipe.yaml push the
subagent from enumerating occurrences ("used in X, used in Y, used in Z")
to interpreting them ("sentinel identity for platform-level operations
when caller context is missing, used in messaging, AI proxy, user mgmt").
Grouping rule: one finding per semantic role, not one per citation. The
references array is now truly optional — populate only when a related
location is worth calling out.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HeinrichvH HeinrichvH merged commit f738f30 into main Apr 22, 2026
3 checks passed
@HeinrichvH HeinrichvH deleted the feat/explore-goose-recipe branch April 22, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant