feat(recall): add proof_count boost to combined scoring by Abdulkadirklc · Pull Request #821 · vectorize-io/hindsight

Abdulkadirklc · 2026-03-31T19:23:28Z

Observations with more supporting evidence now rank slightly higher in recall results. proof_count is threaded through the retrieval pipeline and applied as a multiplicative boost in reranking:

types.py: add proof_count field to RetrievalResult
retrieval.py: include proof_count in SELECT columns
reranking.py: add log1p-normalized proof_count boost (alpha=0.1)

The boost uses the same multiplicative pattern as recency and temporal signals. proof_count=1 is neutral, proof_count=50 gives ~+5% boost. Non-observation fact types are unaffected (neutral 0.5).

Observations with more supporting evidence now rank slightly higher in recall results. proof_count is threaded through the retrieval pipeline and applied as a multiplicative boost in reranking: - types.py: add proof_count field to RetrievalResult - retrieval.py: include proof_count in SELECT columns - reranking.py: add log1p-normalized proof_count boost (alpha=0.1) The boost uses the same multiplicative pattern as recency and temporal signals. proof_count=1 is neutral, proof_count=50 gives ~+5% boost. Non-observation fact types are unaffected (neutral 0.5).

nicoloboschi · 2026-04-01T07:28:38Z

Nice approach! A couple of things to address:

Missing tests — this needs unit tests for the new proof_count boost in apply_combined_scoring. Cover at least: neutral when proof_count is None, neutral when proof_count=1, increasing boost with higher counts, and clamping at 100+.
Other retrieval paths — the proof_count column is only added to the semantic/BM25 query in retrieval.py. Graph retrieval (graph_retrieval.py) and temporal retrieval also build RetrievalResult objects — those paths should populate proof_count too, otherwise graph/temporal-retrieved observations will always get a neutral boost even when they have proof evidence.

nicoloboschi · 2026-04-01T07:28:39Z

The log1p(100) normalizer is hardcoded — any observation with proof_count > 100 gets clamped to the same max boost. This feels like a magic number that will silently break for banks with heavily-reinforced observations. I'd remove the hardcoded cap and let the normalization scale naturally, or at least make it configurable.

…al, normalize scaling

nicoloboschi

A few things to fix:

Remove the BFSGraphRetriever class from graph_retrieval.py (looks like it slipped in from a merge conflict)
Clamp proof_norm to [0, 1] — currently unbounded, so extreme proof counts can exceed the documented ±5% range
Fix misleading comment in test_proof_count_neutral_at_one — says log1p(1) but code uses math.log

…log1p->math.log)

Abdulkadirklc added 4 commits April 1, 2026 12:58

fix(retrieval): Apply proof_count boost to graph and temporal retriev…

24e7e6a

…al, normalize scaling

fix(retrieval): correct proof_norm math to zero-center at count 1

ba4021e

Merge branch 'main' into feature/proof-count-boost

ed4f994

fix(retrieval): Apply proof_count boost to link_expansion retrieval

3aac487

nicoloboschi reviewed Apr 2, 2026

View reviewed changes

fix: remove BFS zombie, clamp proof_norm to [0,1], fix test comment (…

ad425c2

…log1p->math.log)

Abdulkadirklc force-pushed the feature/proof-count-boost branch from e94c225 to ad425c2 Compare April 2, 2026 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(recall): add proof_count boost to combined scoring#821

feat(recall): add proof_count boost to combined scoring#821
Abdulkadirklc wants to merge 6 commits intovectorize-io:mainfrom
Abdulkadirklc:feature/proof-count-boost

Abdulkadirklc commented Mar 31, 2026

Uh oh!

nicoloboschi commented Apr 1, 2026

Uh oh!

nicoloboschi commented Apr 1, 2026

Uh oh!

nicoloboschi left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Abdulkadirklc commented Mar 31, 2026

Uh oh!

nicoloboschi commented Apr 1, 2026

Uh oh!

nicoloboschi commented Apr 1, 2026

Uh oh!

nicoloboschi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants