feat(recall): add proof_count boost to combined scoring#821
feat(recall): add proof_count boost to combined scoring#821Abdulkadirklc wants to merge 6 commits intovectorize-io:mainfrom
Conversation
Observations with more supporting evidence now rank slightly higher in recall results. proof_count is threaded through the retrieval pipeline and applied as a multiplicative boost in reranking: - types.py: add proof_count field to RetrievalResult - retrieval.py: include proof_count in SELECT columns - reranking.py: add log1p-normalized proof_count boost (alpha=0.1) The boost uses the same multiplicative pattern as recency and temporal signals. proof_count=1 is neutral, proof_count=50 gives ~+5% boost. Non-observation fact types are unaffected (neutral 0.5).
|
Nice approach! A couple of things to address:
|
|
The |
…al, normalize scaling
nicoloboschi
left a comment
There was a problem hiding this comment.
A few things to fix:
- Remove the
BFSGraphRetrieverclass fromgraph_retrieval.py(looks like it slipped in from a merge conflict) - Clamp
proof_normto[0, 1]— currently unbounded, so extreme proof counts can exceed the documented ±5% range - Fix misleading comment in
test_proof_count_neutral_at_one— sayslog1p(1)but code usesmath.log
e94c225 to
ad425c2
Compare
Observations with more supporting evidence now rank slightly higher in recall results. proof_count is threaded through the retrieval pipeline and applied as a multiplicative boost in reranking:
The boost uses the same multiplicative pattern as recency and temporal signals. proof_count=1 is neutral, proof_count=50 gives ~+5% boost. Non-observation fact types are unaffected (neutral 0.5).