Skip to content

v8.0: Phases 15, 16, 19 — File Watcher, Embedding Cache, Plugin Updates#116

Merged
RichardHightower merged 46 commits intomainfrom
v8.0/phases-15-16-19
Mar 12, 2026
Merged

v8.0: Phases 15, 16, 19 — File Watcher, Embedding Cache, Plugin Updates#116
RichardHightower merged 46 commits intomainfrom
v8.0/phases-15-16-19

Conversation

@RichardHightower
Copy link
Contributor

Summary

Milestone v8.0 (Performance & DX) — ships Phases 15, 16, and 19 with 46 commits across 88 files (+13k/-1.7k lines).

Phase 15: File Watcher & Background Incremental Reindex

  • FileWatcherService using watchfiles for real-time filesystem monitoring
  • Background incremental reindex on file changes (debounced, dedup-aware)
  • CLI --watch flag for agent-brain index and folder management
  • Health endpoint reports watcher status

Phase 16: Embedding Cache (2-Layer: LRU Memory + aiosqlite Disk)

  • EmbeddingCacheService with in-memory LRU (1k entries) + persistent SQLite (WAL mode)
  • Cache-intercepted embed_text() and embed_texts() with batch put_many()
  • Provider/model change auto-wipe on startup
  • CLI cache status and cache clear commands
  • Event-loop starvation fixes: all CPU-heavy pipeline stages wrapped in asyncio.to_thread()
    • ChromaDB upsert, BM25 build, graph index build, content injection, document loading, chunking
  • 13/13 UAT tests passing

Phase 19: Plugin & Skill Updates for Embedding Cache

  • New agent-brain-cache slash command for cache management
  • Updated help text, API reference, and agent instructions
  • Cache-aware skill and plugin documentation

Infrastructure

  • UAT tester agent (uat-tester.md) with pre-granted permissions for E2E testing
  • Upgraded release agent (release_agent.md) with comprehensive permissions
  • Planning docs and research artifacts

Test plan

  • task before-push passes (894 server + 156 CLI tests)
  • Server coverage: 77%, CLI coverage: 60%
  • Phase 16 UAT: 13/13 tests passing (including cache clear < 10s during active indexing)
  • Phase 15 UAT: 14/15 tests passing (1 minor env-specific issue)
  • CI pipeline passes

🤖 Generated with Claude Code

RichardHightower and others added 30 commits March 6, 2026 12:11
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Research synthesis for Agent Brain v8.0 Performance & Developer Experience.
Covers stack (watchfiles, aiosqlite, cachetools), features (embedding cache,
query cache, file watcher, background incremental, UDS transport), architecture
(injection-first pattern, dual Uvicorn server), and 10 critical pitfalls.
SUMMARY.md includes 4-phase roadmap implications and confidence assessment.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…tegories)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phases 15-18: File Watcher → Embedding Cache → Query Cache → UDS Transport
28 requirements mapped across 4 phases with 100% coverage.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…dates

Verified watchfiles 1.1.1 awatch() per-folder task pattern in project venv.
Confirmed anyio.Event stop_event works in asyncio context. Documented
DefaultFilter built-in exclusions, debounce millisecond conversion, and
backward-compatible FolderRecord/JobRecord extensions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…pdates

Two plans in 2 waves covering FileWatcherService, data model extensions,
CLI watch flags, job source tracking, and plugin documentation updates.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emental updates

- Add watch_mode, watch_debounce_seconds, include_code fields to FolderRecord
- Backward-compatible _load_jsonl using .get() for missing watch fields
- Add source field to JobRecord, JobSummary, JobDetailResponse (default='manual')
- Add source parameter to enqueue_job() (default='manual')
- Add watch_mode, watch_debounce_seconds to IndexRequest
- Add watch_mode, watch_debounce_seconds to FolderInfo API response
- Add AGENT_BRAIN_WATCH_DEBOUNCE_SECONDS=30 to Settings
- 13 new tests covering all model changes and backward compat
…h status

- New FileWatcherService with per-folder asyncio tasks using watchfiles.awatch()
- AgentBrainWatchFilter extends DefaultFilter with dist/build/.next/coverage dirs
- start()/stop() lifecycle via anyio.Event for clean shutdown
- add_folder_watch()/remove_folder_watch() for dynamic registration
- _enqueue_for_folder() routes via job queue with source='auto', force=False
- Wired into lifespan: starts after JobWorker, stops before JobWorker
- /health/status includes file_watcher.running and file_watcher.watched_folders
- IndexingStatus model extended with file_watcher field
- 18 unit tests covering all service behaviors
- task before-push exits 0 (860 passed, 77% coverage)
… shipped

- 15-01-SUMMARY.md: documents FileWatcherService, model extensions, 31 tests
- STATE.md: advance to Plan 02, add v8.0 Phase 15 decisions
- ROADMAP.md: mark 15-01-PLAN.md complete [x]
- Add watch_mode/watch_debounce_seconds to JobRecord for job-level tracking
- Index router passes watch fields from IndexRequest to enqueue_job
- JobWorker._apply_watch_config() updates FolderRecord and notifies
  FileWatcherService after successful job completion
- CLI folders add: --watch auto/off and --debounce flags
- CLI folders list: Watch column showing auto/off per folder
- CLI jobs: Source column showing manual/auto per job
- CLI client: watch_mode and watch_debounce_seconds params on index()
- IndexingService passes include_code to folder_manager.add_folder()
- 10 server tests + 6 CLI tests for watch integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ction

- api_reference.md: folders add/list/remove examples, Watch column, Source
  column, File Watcher section with debounce and exclusion docs
- agent-brain-index.md: --watch and --debounce parameters, watch auto
  examples, file watching notes

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ode integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…emental detection

_verify_collection_delta read job.eviction_summary which was only set
after verification passed — a catch-22 that prevented zero-change
incremental runs from passing. Now receives eviction_result directly
from the pipeline as a parameter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Round 2 after eviction_result verification fix. Blocker resolved.
Only remaining issue is #4 (Watch column not observable in live run
due to test environment having no indexable documents — code confirmed
correct via CLI unit tests).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…LOB)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Two plans in 2 waves covering ECACHE-01 through ECACHE-06:
- Plan 01 (Wave 1): EmbeddingCacheService, EmbeddingGenerator integration, API endpoints, settings
- Plan 02 (Wave 2): CLI cache commands, status display, health endpoint integration

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- Create EmbeddingCacheService with two-layer cache (OrderedDict LRU + aiosqlite WAL)
- Cache key: SHA-256(content):provider:model:dimensions (ECACHE-01)
- Provider fingerprint auto-wipe on mismatch at startup (ECACHE-04)
- get_batch() batch SQL lookup for embed_texts() efficiency
- float32 BLOB storage via struct.pack for ~12 KB/entry at 3072 dims
- Add EMBEDDING_CACHE_MAX_DISK_MB/MAX_MEM_ENTRIES/PERSIST_STATS to Settings
- Add embedding_cache subdirectory to SUBDIRECTORIES and resolve_storage_paths

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ests

- Intercept embed_text/embed_texts with lazy-import cache check (breaks circular import)
- embed_texts uses get_batch() for single-query batch lookup, calls provider only for misses
- embed_query gets caching for free via embed_text delegation
- Initialize EmbeddingCacheService in lifespan BEFORE IndexingService (ECACHE-07)
- Add _build_provider_fingerprint() helper to api/main.py
- Register cache_router at /index/cache with GET (status) and DELETE (clear) endpoints
- Add embedding_cache field to IndexingStatus; populate in health.py when entry_count > 0
- Write 22 unit tests: all 8 required test cases + coverage for stats/singleton
- Update test_storage_paths.py to include embedding_cache in expected keys
- task before-push: 893 passed, 23 skipped, 0 failed

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…red into EmbeddingGenerator

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add DocServeClient.cache_status() GET /index/cache/status
- Add DocServeClient.clear_cache() DELETE /index/cache
- Create commands/cache.py with cache_group (status + clear subcommands)
- cache status: Rich table with entry_count, hit_rate, hits, misses, mem_entries, size
- cache clear: confirmation prompt with entry count; --yes/-y to skip
- Register cache_group in commands/__init__.py and cli.py
- Update CLI help text to include Cache Commands section
…nd tests

- Add embedding_cache field to IndexingStatus dataclass in api_client.py
- Populate embedding_cache from /health/status response in DocServeClient.status()
- agent-brain status now shows embedding cache line: N entries, X% hit rate (H hits, M misses)
- status --json includes embedding_cache dict (or null for fresh installs)
- Fix test_status_json_output: set mock_status.embedding_cache=None (JSON serializable)
- New test_cache_command.py: 12 tests covering help, status, clear (confirmation + --yes)
- [Rule 1 - Bug] Fixed MagicMock JSON serialization error in existing test
- 16-02-SUMMARY.md: cache group commands + metrics in status + 12 tests
- STATE.md: Phase 16 plan 2/2 complete; decisions recorded
- 16-02-PLAN.md checkbox checked
- Phase 16 progress table: 2/2, Complete, 2026-03-10
Client was calling GET /index/cache/status but server mounts at
GET /index/cache/. Fixed URL to match server router.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
9/9 must-haves verified, all 6 ECACHE requirements satisfied.
Gap fix: CLI cache status endpoint URL corrected (7fea667).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…rompt default

- Add "source" key to metadata dict in document_loader.py load_from_folder()
  and load_single_file() so indexing_service manifest diffing can find file
  paths (was reading doc.metadata["source"] which was never set)
- Add trailing slash to DELETE /index/cache/ in api_client.py to prevent
  307 redirect that caused JSONDecodeError on cache clear
- Pass default=False to Rich Confirm.ask so prompt shows [y/N] instead of [y/n]
- Update 16-UAT.md with full round 2 results (6 pass, 7 issues)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…che routes

Fix 13: Belt-and-suspenders for metadata["source"] in indexing_service.py
  with fallback chain + invariant guard for documents > 0 with no paths.
Fix 13: Early return in query_service.py when index is empty (avoids 500).
Fix 7: Cache router rewritten with shared impl + no-slash aliases.
Fix 6: Confirm.ask default=False + regression test for prompt default.
Fix 11: Narrowly omit embedding_cache key when None in health endpoint.
Fix 10: Add --verbose/-v flag to status command for cache detail.

All 916 server + 156 CLI tests pass. task before-push clean.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
RichardHightower and others added 16 commits March 12, 2026 00:29
…locking

Fix 11: Remove response_model=IndexingStatus from /health/status route.
  Always return model_dump dict so the embedding_cache pop actually works
  (response_model was re-serializing through Pydantic, re-adding null).

Fix 8: cache clear() no longer acquires self._lock. Uses its own DB
  connection with PRAGMA busy_timeout=5000 so it never blocks behind a
  long embedding write stream. SQLite WAL handles concurrent writers at
  the page level.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 10: /health/status now prefers storage_backend.get_count() over
  legacy vector_store, fixing total_chunks: 0 when the two Chroma
  instances diverge.

Fix 8: Add asyncio.sleep(0) yield every 10 docs in chunking loops
  (ContextAwareChunker.chunk_documents + code chunking loop) so HTTP
  requests aren't starved during long indexing runs.

Fix 11 test: Add integration test asserting /health/status omits
  embedding_cache key (not null) when cache is empty.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Fix 8 (round 6): The 29.4s cache clear timeout was caused by two
things compounding:

1. embed_texts() miss loop calls cache.put() per miss with no yield,
   starving HTTP request processing during large indexing runs.
   Fix: asyncio.sleep(0) every 10 cache writes.

2. clear() ran VACUUM inline, which can take 10-20s on a large DB
   and needs an exclusive lock.
   Fix: VACUUM scheduled as fire-and-forget background task via
   asyncio.create_task(), so DELETE + response returns immediately.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…veness

- Wrap document post-processing (stat, language detect) in asyncio.to_thread()
- Wrap chunk_single_document text splitting in asyncio.to_thread()
- Wrap chunk_code_document tree-sitter parsing in asyncio.to_thread()
- Replace fire-and-forget VACUUM with PRAGMA wal_checkpoint(TRUNCATE)
- Add put_many() batch cache writes (single transaction, one lock)
- Use put_many() in embed_texts() instead of per-miss put() loop

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…thread()

Root cause: previous fixes only wrapped document loading and chunking
in asyncio.to_thread(), but the pipeline's heaviest sync operations
were still blocking the event loop:
- ChromaDB collection.upsert() with up to 40k items per batch
- BM25Retriever.from_defaults() + persist() (tokenization + disk I/O)
- GraphIndexManager.build_from_documents() (entity extraction)
- ContentInjector.apply_to_chunks() (metadata enrichment)
- os.stat() during manifest save

All sync-heavy operations in the indexing pipeline now run in threads,
keeping the event loop free for concurrent HTTP requests.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 16 (Embedding Cache) UAT fully passed after 8 rounds of fixes.
Key final fix: wrapping all CPU-heavy indexing pipeline stages in
asyncio.to_thread() to prevent event-loop starvation.

Also adds uat-tester agent and uat-testing skill for permission-free
UAT testing in future phases.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Phase 16 delivers a two-layer embedding cache:
- Layer 1: In-memory LRU (OrderedDict, 1000 entries)
- Layer 2: Persistent aiosqlite disk cache (WAL mode, 500MB limit)

Key features:
- Cache-intercepted embed_text/embed_texts with batch lookup
- Provider fingerprint auto-wipe on model change (ECACHE-04)
- CLI: agent-brain cache status, agent-brain cache clear --yes
- /index/cache/status and DELETE /index/cache/ API endpoints
- Health endpoint includes cache metrics when populated
- All CPU-heavy indexing stages run in asyncio.to_thread()
  for non-blocking API responsiveness during indexing

UAT: 13/13 tests passing across 8 rounds of validation.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…ference

- Create agent-brain-plugin/commands/agent-brain-cache.md with status and clear subcommands
- Add CACHE COMMANDS category to agent-brain-help.md display section
- Add agent-brain-cache row to Command Reference table in agent-brain-help.md
- Add GET /index/cache and DELETE /index/cache sections to api_reference.md with response schemas
- Add agent-brain cache status/clear commands to CLI Commands Reference section
- Add cache trigger phrases to using-agent-brain SKILL.md YAML description
- Add Cache Management section to using-agent-brain SKILL.md (when to check, when to clear)
- Add Cache Management to Contents table of contents
- Add cache performance trigger pattern to search-assistant.md
- Add cache performance check step (step 6) to search-assistant.md assistance flow
- Add EMBEDDING_CACHE_MAX_MEM_ENTRIES and EMBEDDING_CACHE_MAX_DISK_MB to configuring-agent-brain env vars table
- Add Embedding Cache Tuning note to configuring-agent-brain SKILL.md
…nagement

- 19-01-SUMMARY.md: 6 plugin files updated with cache management surface
- STATE.md: session progress recorded, plan advanced
- ROADMAP.md: Phase 19 marked complete (1/1 plans)
- REQUIREMENTS.md: XCUT-03 marked complete
- Expanded release_agent.md with comprehensive pre-granted permissions
  (build, quality gates, git, gh CLI, shell utilities)
- Added Phase 19 planning scaffold and research docs
- Added UAT test plans and phase 15 smoke validation plan

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@RichardHightower RichardHightower merged commit a34c769 into main Mar 12, 2026
3 checks passed
@RichardHightower RichardHightower deleted the v8.0/phases-15-16-19 branch March 12, 2026 23:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant