feat: epistemic_type ranking + cold-start boost (#70, #71)#580
feat: epistemic_type ranking + cold-start boost (#70, #71)#580
Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements two ranking improvements to multi_signal_rank in ranking.py: (1) epistemic type awareness that boosts/penalises articles based on whether their epistemic_type matches the detected query intent, and (2) a cold-start floor that prevents fresh high-confidence articles from being buried behind established ones. A new detect_query_intent function classifies queries as procedural, episodic, or general.
Changes:
ranking.py: Addeddetect_query_intent,_cold_start_floor, and updatedmulti_signal_rankwithquery_intentandcold_start_boostparametersretrieval.py: Wireddetect_query_intentinto the main_retrieve_syncretrieval pathtests/core/test_ranking_improvements.py: 29 new tests covering all new behaviours
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
src/valence/core/ranking.py |
Adds epistemic type detection, cold-start floor helper, and extends multi_signal_rank with two new optional parameters |
src/valence/core/retrieval.py |
Integrates detect_query_intent into the main retrieval path and passes query_intent to multi_signal_rank |
tests/core/test_ranking_improvements.py |
New test file with 29 tests covering detect_query_intent, epistemic ranking, cold-start boost, and combined scenarios |
Comments suppressed due to low confidence (1)
src/valence/core/ranking.py:337
- The
score_breakdowninexplainmode stores thefinal_score(which includes the epistemic multiplier and/or cold-start floor) as"final", but each signal's"contribution"still reflects only the pre-adjustment weighted score. When a query intent match multipliesfinal_scoreby 1.3 (or 0.85), or the cold-start floor raises it, the existing invariantsum(contributions) == finalis violated. The existing test intests/core/test_ranking.py(line 196–197) andtests/cli/test_cli.py(line 303–304) happen to pass only because the test articles are created withdays_ago=0andconfidence_overall=0.7(so the cold-start floor doesn't change anything) and have noepistemic_type. Any call tomulti_signal_rank(explain=True, query_intent="procedural")on a procedural article will produce a score_breakdown wheresum(contributions) != final. Consider addingepistemic_multiplierandcold_start_floorfields to the breakdown dict so callers can inspect and audit the full scoring.
if explain:
r["score_breakdown"] = {
"semantic": {
"value": semantic,
"weight": semantic_weight,
"contribution": semantic_weight * semantic,
},
"confidence": {
"value": confidence,
"weight": confidence_weight,
"contribution": confidence_weight * confidence,
},
"recency": {
"value": recency,
"weight": recency_weight,
"contribution": recency_weight * recency,
},
"final": final_score,
}
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| for r in all_results: | ||
| freshness_days = r.get("freshness", 90.0) | ||
| r["_original_created_at"] = r.get("created_at") | ||
| r["created_at"] = (now - timedelta(days=freshness_days)).isoformat() | ||
|
|
||
| query_intent = detect_query_intent(query) | ||
| ranked = multi_signal_rank( | ||
| all_results, | ||
| semantic_weight=0.50, | ||
| confidence_weight=0.35, | ||
| recency_weight=0.15, | ||
| query_intent=query_intent, | ||
| ) |
There was a problem hiding this comment.
The cold-start boost is silently broken in _retrieve_sync. Before calling multi_signal_rank, the code replaces each article's created_at with a fake date derived from freshness_days (line 551). Since _cold_start_floor reads created_at to determine article age, it will evaluate the synthetic date rather than the actual creation timestamp. A newly created article with freshness = 90.0 (the default for unknown age) would have its created_at shifted to 90 days ago, making it ineligible for the cold-start floor entirely.
The fix is to store the real created_at separately before the substitution (e.g. as _original_created_at, which already happens) and pass the real creation time into _cold_start_floor, or alternatively apply the cold-start evaluation after restoring created_at rather than during multi_signal_rank. As written, the cold-start boost works only when multi_signal_rank is called directly (e.g. in tests), but not via the main retrieve() code path.
| def test_general_default(self): | ||
| assert detect_query_intent("python async patterns") == "general" | ||
|
|
||
| def test_general_empty(self): |
There was a problem hiding this comment.
The test name test_general_empty is misleading — the query string "database indexing strategies" is not empty. The test is actually checking that a generic, non-procedural and non-episodic query returns "general". A more accurate name would be test_general_generic_query or test_general_non_matching_query.
| def test_general_empty(self): | |
| def test_general_generic_query(self): |
| decay_rate: float = 0.01, | ||
| min_confidence: float | None = None, | ||
| explain: bool = False, | ||
| query_intent: str | None = None, | ||
| cold_start_boost: bool = True, | ||
| ) -> list[dict]: | ||
| """Apply multi-signal ranking to query results. | ||
|
|
There was a problem hiding this comment.
The multi_signal_rank docstring on line 248 still documents the formula as only the three-signal weighted sum, but the function now also conditionally applies an epistemic type multiplier (×1.3 or ×0.85) and a cold-start score floor. The docstring should be updated to describe the extended formula including these adjustments.
| import pytest | ||
|
|
There was a problem hiding this comment.
The pytest module is imported on line 10 but never used in the test file — no pytest.mark, pytest.raises, pytest.fixture, or similar constructs are present. This unused import should be removed.
| import pytest |
| "what happened", | ||
| "when did", | ||
| ) | ||
| _EPISODIC_CONTAINS = ("session", "last time") |
There was a problem hiding this comment.
The term "session" in _EPISODIC_CONTAINS is a broad substring match that will produce false positives for legitimate procedural/semantic queries such as "configure session timeout", "manage user sessions", or "HTTP session handling". These have nothing to do with episodic (time-anchored) recollections. Consider using a more specific phrase like "last session" or "previous session" instead of the bare word "session".
| _EPISODIC_CONTAINS = ("session", "last time") | |
| _EPISODIC_CONTAINS = ("last session", "previous session", "last time") |
| # --------------------------------------------------------------------------- | ||
| # Query intent detection (#70) | ||
| # --------------------------------------------------------------------------- |
There was a problem hiding this comment.
The PR title and description reference issues #70 and #71, but the actual issues #70 and #71 are about "Implement propagate() API with restriction composition" and "Add strip_on_forward field redaction" respectively — both related to privacy primitives. They have no connection to epistemic type ranking or cold-start boost features being implemented here. The issue references are therefore incorrect.
| _COLD_START_WINDOW_HOURS = 48 | ||
| _COLD_START_FLOOR_FULL = 0.3 # floor at creation -> 24h | ||
| _COLD_START_FLOOR_HALF = 0.15 # floor at 24h -> 48h | ||
| _COLD_START_MIN_CONFIDENCE = 0.7 | ||
|
|
||
|
|
||
| def _cold_start_floor(article: dict, confidence: float) -> float | None: | ||
| """Return the cold-start score floor for a qualifying article, or None.""" | ||
| if confidence < _COLD_START_MIN_CONFIDENCE: | ||
| return None | ||
|
|
||
| created_at = article.get("created_at") | ||
| if created_at is None: | ||
| return None | ||
|
|
||
| if isinstance(created_at, str): | ||
| try: | ||
| created_at = datetime.fromisoformat(created_at.replace("Z", "+00:00")) | ||
| except (ValueError, TypeError): | ||
| return None | ||
|
|
||
| if isinstance(created_at, datetime): | ||
| if created_at.tzinfo is None: | ||
| created_at = created_at.replace(tzinfo=UTC) | ||
| now = datetime.now(UTC) | ||
| age_hours = (now - created_at).total_seconds() / 3600.0 | ||
| if age_hours <= 24: | ||
| return _COLD_START_FLOOR_FULL | ||
| elif age_hours <= 48: | ||
| return _COLD_START_FLOOR_HALF |
There was a problem hiding this comment.
The constant _COLD_START_WINDOW_HOURS = 48 is defined but never referenced in _cold_start_floor. The 24 and 48 hour thresholds are hardcoded as magic numbers directly in the function body. _COLD_START_WINDOW_HOURS should be used in the elif age_hours <= _COLD_START_WINDOW_HOURS comparison, and a companion _COLD_START_MID_WINDOW_HOURS = 24 constant (or similar) should guard the first branch — consistent with how _COLD_START_FLOOR_FULL, _COLD_START_FLOOR_HALF, and _COLD_START_MIN_CONFIDENCE are used.
8b11a7a to
9b88c14
Compare
- Add detect_query_intent() to classify queries as procedural/episodic/general - Apply 1.3x boost when article epistemic_type matches query intent - Apply 0.85x penalty when procedural↔episodic conflict - Add cold_start_boost: fresh articles (≤48h, confidence≥0.7) get score floor - 0–24h: floor = 0.30; 24–48h: floor = 0.15; >48h: no floor - Wire detect_query_intent into retrieval.py retrieve path - 29 tests covering all cases
9b88c14 to
6281fdf
Compare
Summary
Implements two related improvements to
multi_signal_rankinranking.py.#70 — Epistemic Type Awareness
detect_query_intent(query: str) -> strthat classifies queries as"procedural","episodic", or"general"based on prefix/keyword matching and date patternsquery_intent: str | None = Noneparameter tomulti_signal_rankepistemic_typematches query intent"general"intent or"semantic"epistemic_type → no adjustmentdetect_query_intentthroughretrieval.pyretrieve path#71 — Cold-Start Mitigation
cold_start_boost: bool = Trueparameter tomulti_signal_rankTests
29 new tests in
tests/core/test_ranking_improvements.pycovering all cases including combined scenarios.Checklist