Replace auto-reindex hooks with staleness signals + background-bootstrap status#61
Replace auto-reindex hooks with staleness signals + background-bootstrap status#61elliottregan merged 17 commits intomainfrom
Conversation
The lefthook hooks that auto-reindexed on every commit, checkout, and merge are more costly than useful — typical commits touch a handful of files, making a full reindex wasteful. Staleness signals (next commits) will let agents reindex deliberately before high-stakes queries instead. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
search/corpus/staleness.go: CodeStaleness compares git ls-files hashes against qdrant; CommitsStaleness compares commit count against indexed point count. Both are cheap (sub-second on typical repos). search/status/: New package for tracking index run state. Writes .cspace/search-index-status.json atomically (write tmp + rename). Tracks per-corpus state (completed/failed/disabled), in-progress runs with throttled progress updates (~1/sec), and preserves prior corpus states across Writer instances. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- index.Run now accepts an optional StatusWriter interface to report start/progress/finish/fail lifecycle events to the status file. - Both CLI surfaces (cspace search, cspace-search) wire a status.Writer into every index run, including init loops which also mark disabled corpora. - `cspace search status` (and `cspace-search status`) reads the status file, checks staleness for code/commits corpora against qdrant, and renders a human-readable summary. --json flag for programmatic use. - Query paths in both CLIs and the MCP server now append staleness warnings to the response envelope when the index is out of date. - New `search_status` MCP tool returns the same structured status output so agents can check index health before high-stakes queries. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace auto-refresh documentation with the new manual workflow: check `cspace search status` or `search_status` MCP tool before high-stakes queries, reindex explicitly when stale, and surface failures before trusting results. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
✅ Deploy Preview for cspace-cli canceled.
|
✅ Deploy Preview for cspace-cli canceled.
|
Review: staleness signals + status fileGood structure and clean mapping of commits to tasks. Two blocking bugs before merge, plus some significant-cost concerns and minor polish. 🔴 Critical1. Data-loss race in In sw, _ := status.NewWriter(root)
for _, corpusID := range []string{"code", "commits", "context", "issues"} {
err := runSearchIndex(corpusID, true)
switch {
case errors.Is(err, config.ErrCorpusDisabled):
sw.DisableCorpus(corpusID)
...
Net effect with default config (code+commits enabled, context+issues disabled): after Fixes in order of preference:
Either path needs a regression test that exercises the full init sequence and asserts all four corpora have the expected state afterward. 2.
Fix: compare max indexed commit date to 🟠 Significant3. Staleness check is on the MCP search hot path, costs 100–500ms per query
Options:
My preference is the cache, with a test asserting the second call within the window doesn't touch git/qdrant. 4. The check only counts files whose current hash is absent from the index; it doesn't count ghost entries (files in the index whose paths are no longer tracked). After a significant delete, the index is stale in a way this signal won't surface. Cheap fix: after the 🟡 Minor
✅ Things I liked
🧪 Test gaps to cover before merge
SummaryShip-blockers: #1 (data loss) and #2 (permanent false positive). The rest are improvements. |
The lefthook rc: value was a raw shell command instead of a file path, so the generated hook script never added ~/go/bin to PATH — making golangci-lint and goimports invisible to the hooks. Fix by extracting the export into .lefthookrc and pointing rc: at it. Also: widen pre-commit lint from --new-from-rev=HEAD~1 to full ./..., add go vet as a parallel pre-commit step, make setup-hooks fail loudly when lefthook is missing, add a check-hooks warning to the build target, and document the hook setup in the README. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The lint job had an `if: github.event_name == 'pull_request'` guard that skipped it on pushes to main. The check job was missing fmt-check entirely. Both are now unconditional so main stays clean even when PRs are merged without full CI (e.g. admin merge). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- gofmt: remove trailing whitespace in struct tag alignment (internal/cli/search.go) - ineffassign: remove unused headHash variable; the function already validates git state via rev-list --count (search/corpus/staleness.go) - staticcheck SA1012: pass context.TODO() instead of nil to handleStatus in tests (search/mcp/server_test.go) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The init loop held an outer status.Writer whose in-memory snapshot went stale as runSearchIndex created inner writers. When the outer writer later called DisableCorpus, it flushed its stale view and wiped prior code/commits entries. Fix: move the disabled-state write into runSearchIndex itself. When config.Build returns ErrCorpusDisabled, the function creates a fresh single-use writer, records the disabled state, and returns the sentinel. The init loop needs no writer of its own. Mirrored the same change in cmd/cspace-search/main.go's runIndexCorpus. Added regression test: TestWriter_DisableCorpusDoesNotClobberExistingEntries exercises the full sequence (two completed corpora, then two disabled) and asserts all four entries survive with correct states and indexed counts. Fixes PR #61 review item #1. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CommitsStaleness compared rev-list --count against len(indexed), which with commits.limit=500 and a 1000-commit repo permanently reports "500 new commits" even right after indexing. The signal was always noise. Fix: compare HEAD's commit date against the max indexed commit date instead of comparing raw counts. Added MaxDateLister interface and MaxPayloadDate on the qdrant adapter — scrolls the "date" payload field once and returns the max. Falls back to HEAD hash lookup for listers that don't support the new interface. Tests: - Updated TestCommitsStaleness_UpToDate/NewCommits for date-based logic - Added TestCommitsStaleness_RespectsLimit: 5-commit repo, index has 2 points but maxDate matches HEAD — reports not stale (old logic would say "3 new commits") Fixes PR #61 review item #2. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
mcpAppendStalenessWarning ran CodeStaleness/CommitsStaleness on every MCP query — 100-500ms of git I/O + SHA256 + qdrant scroll per call. With 5+ queries per agent turn, that's seconds of invisible overhead. Fix: in-process cache keyed on (corpus, projectRoot) with 30s TTL. CodeStalenessCached / CommitsStalenessCached wrap the raw functions and hit the cache first. All query-path and status callers now use the cached variants. First query in a session pays the full cost; subsequent queries within the TTL window return instantly. Also reduced git ls-files timeout from 30s to 5s — the staleness check runs on the query hot path and should fail fast rather than hang. Tests: - TestStalenessCache_HitsWithinTTL: second call returns cached result without touching the lister (verified via call counter) - TestStalenessCache_ExpiresAfterTTL: after TTL, the lister is called again (uses 1ms TTL override for speed) Fixes PR #61 review items #3 and #7. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CodeStaleness only counted tracked files whose hash was absent from the index. It never counted ghost entries — files in the index whose content hash no longer appears among tracked files (deleted or renamed since last index). After a significant delete, the index was stale in a way the signal couldn't surface. Fix: track which indexed hashes are "seen" during the git ls-files walk. After the walk, count unseen hashes as ghosts. Report both sides of the symmetric difference: "N files changed, M deleted since last index". Test: TestCodeStaleness_DeletedFiles — 3-file repo, delete one, verify CodeStaleness reports stale with a deletion-aware reason. Fixes PR #61 review item #4. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
short-circuit DisableCorpus Four minor improvements rolled into one commit: #5 — Extract shared status.Compute(): both MCP handleStatus and CLI runSearchStatus had duplicate code building the per-corpus status shape. Now both call status.Compute(root, disabled, checker) and serialize the same ComputedStatus struct. MCP maps it to its own StatusOutput; CLI serializes directly. #6 — Wire flush error reporting: Writer.flush now writes errors to w.ErrLog (defaults to os.Stderr) instead of silently dropping them. Callers can override ErrLog in tests. Test: TestWriter_FlushReportsErrors verifies the error surface is hit when writing to a nonexistent path. #8 — Capture time.Now().UTC() once per mutator: each of StartCorpus, UpdateProgress, FinishCorpus, FailCorpus, DisableCorpus now calls time.Now().UTC() once at the top and reuses. Pure hygiene, no test. #9 — DisableCorpus short-circuit: if the corpus is already disabled, return without flushing. Avoids redundant I/O on re-runs. Test: TestWriter_DisableCorpusShortCircuit verifies file mod-time doesn't change on a second DisableCorpus call. Also added TestCompute_Basic for the new Compute function. Fixes PR #61 review items #5, #6, #8, #9. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Patterns behind the staleness-signal bugs, logged so future agents avoid the same footguns: 1. Writer instances must refresh state before writing OR be single-use 2. Don't compare counts when limits are in play — compare identities 3. Secondary work on query paths needs a latency budget 4. Delta detection needs symmetric difference, not one-sided membership 5. Don't use planet names for agent-spawned cspace instances Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Agents spawning cspace instances should use descriptive or numbered names (issue-<n>, cs-agent-<n>, short task labels) instead of planet names, which are reserved for the human-facing TUI's auto-assignment. Updated: - lib/skills/delegating-container-agents/SKILL.md: replaced mars examples with descriptive names, added naming note - lib/agents/coordinator.md: added naming guidance near cspace up usage - CLAUDE.md: added note in "Instance naming" key pattern Implements PR #61 Part C and matches context decision #5. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Auto-indexing at container boot blocked provisioning by minutes and wasted resources in the common case where search isn't queried that session. Most containers don't need a warm search index at boot. Added BootstrapSearch bool to provision.Params. Phase 16 now runs only when the flag is true: - cspace up: defaults to false; add --index flag for explicit opt-in - cspace advisor up (advisor.Launch): always true — advisors are session-continuous and search-heavy - cspace coordinate: always true — coordinators consult search indexes when routing work - TUI flows: false (interactive use, no overhead) Test: TestBootstrapSearchParam validates the Params field defaults and that Phases[15] is still "Bootstrapping search". Updated using-semantic-search skill to document the opt-in behavior. Logged decision: "Bootstrap search indexing is opt-in; advisors and coordinators get it by default". Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
New master switch: `enabled: true` at the top of search.yaml is required before any corpus indexes or responds. Prevents a freshly cloned cspace project from incidentally indexing node_modules on first container bootstrap (or on the first advisor / coordinator spawn, both of which opt into phase-16 via PR #61's role gating). Implementation: - `Enabled bool` on search/config.Config, `enabled: false` in embedded default.yaml. - `ErrSearchDisabled` sentinel in search/config/runtime.go, checked first in BuildWithConfig so every downstream call site (CLI query/index, MCP search_code/search_context/search_issues, init loops, phase-16 bootstrap) refuses uniformly. - `cspace search status` / search_status MCP tool: ComputedStatus grows an `Enabled` flag; when false, Corpora is empty and clients show a one-line "not configured" hint. - Init loops short-circuit with a single clear message instead of four per-corpus "disabled" lines. - init_template.yaml leads with the master switch commented with opt-in guidance. Tests updated to cover the new default and the opt-in path; the pre-existing MCP status handler tests now pass an explicit `Enabled: true` since the default is off. Also fixes a latent bug in CommitsStaleness's date comparison that surfaced during testing — time.Truncate(24*time.Hour) does not perform calendar-day truncation when a timezone offset is involved, so HEAD and max-indexed dates could disagree at UTC boundaries while formatting to the same string. Switched to ISO-date lexical compare which is timezone-agnostic and equivalent.
Summary
lefthook.yml— thepost-commit,post-checkout, andpost-mergehooks that firedcspace-search indexon every git event are gone. They were more costly than useful for typical commits.search/corpus/staleness.go) — cheap checks that compare git state against qdrant without re-embedding.CodeStalenessdiffsgit ls-fileshashes;CommitsStalenesscompares commit counts.search/status/) — atomic JSON status file at.cspace/search-index-status.jsontracking per-corpus state (completed/failed/disabled), in-progress runs with throttled progress, and timestamps.cspace search statusCLI command — human-readable and--jsonoutput showing all corpus states, staleness, and any running index. Added to bothcspace searchand standalonecspace-search.Envelope.Warningfield when the index is stale, across CLI and MCP surfaces.search_statusMCP tool — structured status output for agents to check index health before high-stakes queries.using-semantic-searchskill — replaced auto-refresh docs with the new manual workflow: check status, reindex when stale.Related decision:
.cspace/context/decisions/2026-04-22-scope-cspace-search-to-code-commits-close-docs-decisions-con.mdChecklist
lefthook.ymlsearch/corpus/staleness.go)search/status/)cspace search statusCLI command (both surfaces)search_statusMCP toolindex.Runusing-semantic-searchskillTest plan
go test ./search/...— all search package tests pass (corpus, status, mcp, index, query, config, cluster)go vet ./...— cleango build ./...— cleancspace-search statusshows correct output for never-indexed, disabled, completed, and stale corporacspace-search status --jsonemits valid structured JSONcspace-search index --corpus=codewrites status file with correct state and countcspace up(requires live container)search_statusMCP tool returns correct output viacspace search mcpstdioAdditional: hook/CI hardening
Audit findings
rc:config was broken (root cause Route implementer through supervisor and persist event log #1) —lefthook.ymlhadrc: export PATH="$HOME/go/bin:$PATH"which is a raw shell command, but lefthook'src:expects a file path. The generated hook script rendered this as[ -f export PATH="..." ] && . export PATH="..."— a file-existence test on a nonsensical filename. It never matched, so~/go/binwas never added toPATH, andgolangci-lint/goimportswere invisible to the hooks.--new-from-rev=HEAD~1was too narrow (root cause Docs: Scaffold Astro + Starlight project #2) — Pre-commit lint only flagged issues introduced since the prior commit. Pre-existing lint in touched files slipped through, and if a prior commit was made without hooks (due to root cause Route implementer through supervisor and persist event log #1), the delta was against an unchecked baseline.lefthook installwas never automated (root cause Docs: Scaffold Astro + Starlight site in docs/ #3) —make setup-hooksexisted but was not called bymake build,make test, or any commonly-run target. The README didn't mention it. Fresh-clone contributors had no hooks active.CI lint was PR-only (root cause Docs: Getting Started guides #4) — The
lintjob hadif: github.event_name == 'pull_request', so pushes tomain(e.g. admin merges) skipped lint entirely. Thecheckjob ranvetandtestbut notfmt-check, letting format regressions through.No
--no-verifyabuse found — No commits in history used--no-verify.No
core.hooksPathoverride — Git config was clean.Changes
lefthook.yml+.lefthookrc(new file):export PATH="$HOME/go/bin:$PATH"into.lefthookrc; pointrc:at the file so the hook script sources it correctly.--new-from-rev=HEAD~1to fullgolangci-lint run ./....vet-go(go vet ./...) as a parallel pre-commit step gated on*.gofiles.Makefile:setup-hooksnow fails loudly iflefthookis not in PATH (instead of silently succeeding via the fallback chain in the hook script).check-hookstarget warns to stderr if.git/hooks/pre-commitis missing; wired intobuildso contributors notice on first build..github/workflows/ci.yml:if: github.event_name == 'pull_request'from thelintjob — lint now runs on pushes tomaintoo.Format checkstep (make fmt-check) to thecheckjob.check,lint, andintegration-searchas required status checks in repo settings after merge.README.md:make install-toolsandmake setup-hooksto the Development section.Lint fixups (surfaced by the stricter pre-commit lint):
internal/cli/search.go— gofmt: removed trailing whitespace in struct tag alignment.search/corpus/staleness.go— ineffassign: removed unusedheadHashvariable (the function validates git state viarev-list --count).search/mcp/server_test.go— staticcheck SA1012: replacednilcontext withcontext.TODO()in test calls.🤖 Generated with Claude Code
Addressed review feedback
All items from the review comment are now fixed:
runSearchInit— outer writer clobbers inner writers94c8b6dCommitsStalenesspermanent false positive undercommits.limit3c576c90d6cd6cCodeStalenessmisses deleted-but-still-indexed ghost entries494828cstatus.Computec76d629Writer.flushswallows errors silently — addedErrLogcallbackc76d6290d6cd6ctime.Now().UTC()calls — capture once per mutatorc76d629DisableCorpusflushes unnecessarily — added no-op short-circuitc76d629New tests added
TestWriter_DisableCorpusDoesNotClobberExistingEntries— regression for Route implementer through supervisor and persist event log #1TestCommitsStaleness_RespectsLimit— regression for Docs: Scaffold Astro + Starlight project #2TestStalenessCache_HitsWithinTTL/TestStalenessCache_ExpiresAfterTTL— cache behavior for Docs: Scaffold Astro + Starlight site in docs/ #3TestCodeStaleness_DeletedFiles— ghost detection for Docs: Getting Started guides #4TestCompute_Basic— shared Compute function for Docs: CLI Reference #5TestWriter_FlushReportsErrors— error surface for Docs: Architecture & Concepts #6TestWriter_DisableCorpusShortCircuit— no-op behavior for issue-4: Getting Started guides #9Durable context decisions logged
5 pattern decisions logged to
.cspace/context/decisions/in4171564:Instance naming guidance
Updated
lib/skills/delegating-container-agents/SKILL.md,lib/agents/coordinator.md, andCLAUDE.mdto note that planet names are reserved for the TUI — agents should use descriptive names (issue-<n>,cs-agent-<n>, etc.) in1b3fcd0.Search bootstrap now opt-in
b1980b5— Phase 16 (search bootstrap) is now role-aware:cspace up: NO auto-index by default; pass--indexto opt incspace advisor up: auto-index (session-continuous, search-heavy)cspace coordinate: auto-index for the coordinator🤖 Generated with Claude Code