Skip to content

distillery_search: project-filtered scores appear rescaled per scope; top hit always near 1.0 regardless of true similarity #370

@norrietaylor

Description

@norrietaylor

Summary

When distillery_search is called with a project= filter, the returned score field no longer tracks cosine similarity. The top hit within the filtered scope is returned at score ≈ 1.0 even when its content is unrelated to the query. Unfiltered search on the same store returns the expected cosine distribution.

This makes the score field unreliable for any caller that thresholds relevance when a project filter is in play.

Repro (staging, 2026-04-18 UX test rerun)

Seeded six entries under two projects:

staging-uxtest-2026-04-18-projA: "projA entry 1: alpha", "projA entry 2: beta", "projA entry 3: gamma"
staging-uxtest-2026-04-18-projB: "projB entry 1: delta", "projB entry 2: epsilon", "projB entry 3: zeta"

1. Unfiltered search for "alpha" (correct behaviour)

score=1.000000  projA entry 1: alpha
score=0.935135  uxtest-a4 E1 ... placeholder alpha  (unrelated seed from A4)
score=0.151365  projA entry 2: beta
score=0.138322  projA entry 3: gamma
score=0.125687  projB entry 1: delta
score=0.101565  projB entry 3: zeta
score=0.090044  projB entry 2: epsilon
...

True cosine similarity — "alpha" hits score 1.0, other entries spread across 0.05–0.15. This is exactly what you'd expect.

2. Filtered search for "alpha", project=projA (looks fine)

score=1.000000  projA entry 1: alpha
score=0.015136  projA entry 2: beta
score=0.000000  projA entry 3: gamma

Top hit 1.0, rest near 0. Fine — the scores collapse to effectively zero for non-matches, but at least they rank correctly.

3. Filtered search for "alpha", project=projB (the bug)

score=1.000000  projB entry 1: delta     ← this has no relation to "alpha"
score=0.491935  projB entry 3: zeta
score=0.000000  projB entry 2: epsilon

delta is returned at score=1.0 for the query alpha solely because it is the in-scope top hit. The actual vector similarity (per the unfiltered run above) is 0.125 — the score is silently rescaled to 1.0 inside the project filter.

Impact

  • Any caller that compares scores to a threshold (e.g. "skip if similarity < 0.7") will accept irrelevant results under a project filter.
  • /pour, /investigate, and any ranking UI that displays "Relevance: X%" will show misleadingly high scores.
  • No way for the caller to tell whether a filtered-search 1.0 means "exact match" or "least-bad in this project".

Suggested fix

Return the raw cosine similarity in filtered search, same as unfiltered. If the feature is "rank results within the project", the ordering is already achieved by sorting on the raw score — no post-filter normalization is needed. Whatever is rescaling to [0,1] within the filtered result set should be removed.

If the intent is explicitly a normalized score, add a separate field (e.g. rank_score) rather than overloading score.

Related

  • Distinct from fix(store): RRF hybrid search score normalization clusters near 1.0 #170 (RRF hybrid normalization, closed) — that one was about hybrid BM25+vector clustering near 1.0 for dense result sets. This is vector-only, project-filtered, and the score is 1.0 for a zero-similarity top hit.
  • Surfaced during the 2026-04-18 UX test rerun (A5 project scoping).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions