Skip to content

feat: add FTS5 full-text search index#5

Merged
andyhtran merged 1 commit intomainfrom
feat/fts5-search
Mar 15, 2026
Merged

feat: add FTS5 full-text search index#5
andyhtran merged 1 commit intomainfrom
feat/fts5-search

Conversation

@andyhtran
Copy link
Copy Markdown
Owner

Summary

  • Replace brute-force JSONL streaming search with a SQLite FTS5 index (15-72x faster, 76-90% less memory)
  • Add cct index sync/rebuild/status commands for index management
  • Add --sort relevance and --sync flags to search command
  • Incremental sync with change detection, corruption auto-recovery, and Porter stemmer

Details

The old search scanned all JSONL session files on every query (~2s for ~1200 sessions). The new approach builds an FTS5 index in the XDG cache directory (~/.cache/cct/index.db) with automatic incremental sync. Searches now complete in 50-140ms.

Key design decisions:

  • Contentless FTS5 — stores byte offsets into original JSONL files rather than duplicating text, keeping the index compact
  • Streaming search preserved — the -s (single-session) flag still uses the old streaming path since indexing one file is unnecessary overhead
  • Substring fallback — compound terms like pre-commit that FTS5 tokenizes on punctuation fall back to substring scan when FTS returns zero results
  • 5-minute sync cache — skips filesystem scan if synced recently; --sync flag bypasses this

The "recall gap" between old substring matching and FTS5 was investigated across sessions and found to be entirely false positives (e.g. "bug" matching inside "debug").

Test plan

  • Verified just ci passes (format, lint, tests)
  • Smoke tested all CLI commands (list, search, info, stats, export, changelog, plans, resume, index sync/rebuild/status)
  • Verified incremental sync detects new/updated/deleted sessions correctly
  • Verified --sort relevance surfaces sessions where the term is central vs incidental
  • Verified project filter messages distinguish nonexistent project from no query matches
  • Verified search with -s flag still uses streaming path
  • Verified corruption recovery recreates index from corrupt DB file

🤖 Generated with Claude Code

Replace brute-force JSONL streaming search with a SQLite FTS5 index
stored in the XDG cache directory. Searches are 15-72x faster with
76-90% less memory, and word-boundary matching eliminates false
positives from substring matching (e.g. "bug" inside "debug").

New capabilities:
- `cct index sync/rebuild/status` commands for index management
- `--sort relevance` option for match-density ordering
- `--sync` flag to force index refresh before searching
- Incremental sync with 5-minute cache and change detection
- Corruption auto-recovery (detects and rebuilds broken index)
- Porter stemmer for natural language matching

The old streaming search is preserved for single-session lookups (-s).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@andyhtran andyhtran merged commit c1ced8d into main Mar 15, 2026
3 checks passed
@andyhtran andyhtran deleted the feat/fts5-search branch March 15, 2026 09:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant