perf(bench): add fuzzy-cache and token-dedup search micro-benchmarks#284
Conversation
|
Hey @sebastianbreguel — appreciate the benchmark work, but I need some changes before this can merge. A few things: 1. Docs should not reference PR numbers. Same applies to the bench script comments: 2. Don't create new test files. 3. Scope: docs update should be holistic, not per-PR. 4. README + package.json changes are scope creep. The benchmark numbers themselves are solid — the methodology (5,000 docs, 2,000 iterations, median of 3) is good. Just needs the framing fixes above. To summarize what I need:
|
Adds two search-path micro-benchmarks to tests/benchmark.ts: - fuzzy-correct LRU cache (cold vs warm lookup latency) - token dedup (FTS5 MATCH cost with/without duplicated query tokens) Uses the real ContentStore for the cache path and raw FTS5 for the dedup path (isolates engine-side cost without touching pre-deduped sanitize). 5000 seeded documents x 2000 iterations per measurement. No runtime changes, bench-only.
f739371 to
51d5b71
Compare
|
Hi @mksglu — thanks for the thorough feedback. Reworked the whole PR to address every point:
Ready for another look whenever you have time. |
Summary
Adds two search-path micro-benchmarks to the existing
tests/benchmark.ts— no new test file, no BENCHMARK.md changes. Addresses maintainer feedback on the earlier version of this PR.What's measured
fuzzy-correct LRU cache: cold 1st call (full levenshtein scan over vocabulary) vs warm cache hit on a repeated typo lookup.token dedup: FTS5 MATCH wall-clock for a 5x-duplicated token query vs the deduped 1-token baseline. Uses raw FTS5 (not the ContentStore path) to isolate the engine-side cost without hitting the already-deduped sanitize.Methodology
./node_modules/.bin/tsx tests/benchmark.ts.Scope
Test plan
./node_modules/.bin/tsx tests/benchmark.ts— runs end-to-end, prints the new "Search Path Performance" section alongside the existing executor benches.main; diff is one file, +103 lines.