perf(bench): add fuzzy-cache and token-dedup search micro-benchmarks by sebastianbreguel · Pull Request #284 · mksglu/context-mode

sebastianbreguel · 2026-04-14T15:17:12Z

Summary

Adds two search-path micro-benchmarks to the existing tests/benchmark.ts — no new test file, no BENCHMARK.md changes. Addresses maintainer feedback on the earlier version of this PR.

What's measured

fuzzy-correct LRU cache: cold 1st call (full levenshtein scan over vocabulary) vs warm cache hit on a repeated typo lookup.
token dedup: FTS5 MATCH wall-clock for a 5x-duplicated token query vs the deduped 1-token baseline. Uses raw FTS5 (not the ContentStore path) to isolate the engine-side cost without hitting the already-deduped sanitize.

Methodology

5,000 seeded documents, 2,000 iterations per measurement.
Runs as a new section inside the existing benchmark harness: ./node_modules/.bin/tsx tests/benchmark.ts.

Scope

Single concern: bench code only.
No runtime changes, no doc changes, no README/package.json changes.
No PR-number references anywhere (subject-named, not origin-named).

Test plan

./node_modules/.bin/tsx tests/benchmark.ts — runs end-to-end, prints the new "Search Path Performance" section alongside the existing executor benches.
Rebased onto latest main; diff is one file, +103 lines.

mksglu · 2026-04-16T15:03:24Z

Hey @sebastianbreguel — appreciate the benchmark work, but I need some changes before this can merge. A few things:

1. Docs should not reference PR numbers.
BENCHMARK.md is an official document that will outlive any PR. Lines like "fuzzyCorrect LRU cache (#274)" and "#276 (token dedup)" tie the docs to GitHub history. Describe the optimization by what it does, not by which PR introduced it. Future readers don't care about PR numbers — they care about what the system does now.

Same applies to the bench script comments: // Micro-benchmarks for the two search-perf PRs that landed on next: #274, #276 — remove PR references. Describe the benchmarks by their subject (fuzzy cache, token dedup), not their origin.

2. Don't create new test files.
Per CONTRIBUTING.md: "Do NOT create new test files. Add your tests to the existing file that covers the same domain." tests/bench-search.ts should live inside the existing tests/benchmark.ts or be added to tests/store.test.ts as a describe block. If it needs its own file, discuss with me first.

3. Scope: docs update should be holistic, not per-PR.
Updating BENCHMARK.md for just 2 optimizations creates fragmented docs. If you want to document search performance, document ALL search optimizations (FTS5 optimize, mmap_size, stopword filtering, RRF reranking, content-type boost) — not just the ones from your PRs. I'd rather have one comprehensive "Search Performance" section than incremental additions tied to individual contributions.

4. README + package.json changes are scope creep.
This PR is titled "docs(benchmark)" but also modifies README.md (Pi install instructions) and package.json (keywords). Those are separate concerns. Keep the PR focused on benchmarks only, or split it.

The benchmark numbers themselves are solid — the methodology (5,000 docs, 2,000 iterations, median of 3) is good. Just needs the framing fixes above.

To summarize what I need:

Remove all PR number references from docs and code comments
Move bench into existing test file or get approval for new file
Either expand to cover all search optimizations or keep it as a standalone bench script without BENCHMARK.md changes
Drop the README/package.json changes into a separate PR

Adds two search-path micro-benchmarks to tests/benchmark.ts: - fuzzy-correct LRU cache (cold vs warm lookup latency) - token dedup (FTS5 MATCH cost with/without duplicated query tokens) Uses the real ContentStore for the cache path and raw FTS5 for the dedup path (isolates engine-side cost without touching pre-deduped sanitize). 5000 seeded documents x 2000 iterations per measurement. No runtime changes, bench-only.

sebastianbreguel · 2026-04-16T15:31:30Z

Hi @mksglu — thanks for the thorough feedback. Reworked the whole PR to address every point:

No PR refs anywhere. Dropped #274/#276 from code comments and commit message. Sections are now named by what they measure (fuzzy-correct LRU cache, token dedup), not by origin.
No new test file. tests/bench-search.ts is gone — its logic lives inside the existing tests/benchmark.ts as a new "Search Path Performance" section, following your CONTRIBUTING guidance.
BENCHMARK.md reverted. Rather than fragment the docs with a 2-optimization Part 3, I dropped the BENCHMARK.md changes entirely (you offered that as an acceptable option). Happy to open a separate, comprehensive "Search Performance" doc PR later that covers FTS5 optimize, mmap_size, stopword filtering, RRF reranking, content-type boost, fuzzy cache, and token dedup together, if you want that.
README/package.json gone. Those came in via an old rebase on top of the #283 Pi branch before it was squash-merged. Reset the branch onto current main, so now the diff is exactly one file (tests/benchmark.ts, +103).

Ready for another look whenever you have time.

mksglu force-pushed the next branch from 1d14094 to cf3d9a5 Compare April 15, 2026 13:17

github-actions bot added 3 commits April 15, 2026 18:22

ci: update install stats

3a8384e

ci: update install stats

72eacf5

ci: update install stats

ffb1382

sebastianbreguel force-pushed the docs/benchmark-search-perf branch from f739371 to 51d5b71 Compare April 16, 2026 15:31

sebastianbreguel changed the title ~~docs(benchmark): search-perf section for #274 LRU + #276 dedup~~ perf(bench): add fuzzy-cache and token-dedup search micro-benchmarks Apr 16, 2026

Merge branch 'next' into docs/benchmark-search-perf

fea1ef7

mksglu merged commit d0d71e3 into mksglu:next Apr 16, 2026
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf(bench): add fuzzy-cache and token-dedup search micro-benchmarks#284

perf(bench): add fuzzy-cache and token-dedup search micro-benchmarks#284
mksglu merged 5 commits intomksglu:nextfrom
sebastianbreguel:docs/benchmark-search-perf

sebastianbreguel commented Apr 14, 2026 •

edited

Loading

Uh oh!

mksglu commented Apr 16, 2026

Uh oh!

sebastianbreguel commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

sebastianbreguel commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

mksglu commented Apr 16, 2026

Uh oh!

sebastianbreguel commented Apr 16, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sebastianbreguel commented Apr 14, 2026 •

edited

Loading