Skip to content
This repository was archived by the owner on Mar 2, 2026. It is now read-only.

feat: epistemic_type ranking + cold-start boost (#70, #71)#580

Merged
zonk1024 merged 2 commits intomainfrom
ranking-improvements
Feb 28, 2026
Merged

feat: epistemic_type ranking + cold-start boost (#70, #71)#580
zonk1024 merged 2 commits intomainfrom
ranking-improvements

Conversation

@zonk1024
Copy link
Contributor

Summary

Implements two related improvements to multi_signal_rank in ranking.py.

#70 — Epistemic Type Awareness

  • Added detect_query_intent(query: str) -> str that classifies queries as "procedural", "episodic", or "general" based on prefix/keyword matching and date patterns
  • Added query_intent: str | None = None parameter to multi_signal_rank
  • Applies 1.3× multiplier when article epistemic_type matches query intent
  • Applies 0.85× penalty when there is a procedural↔episodic conflict
  • "general" intent or "semantic" epistemic_type → no adjustment
  • Wired detect_query_intent through retrieval.py retrieve path

#71 — Cold-Start Mitigation

  • Added cold_start_boost: bool = True parameter to multi_signal_rank
  • Fresh articles (created within 48h) with confidence ≥ 0.7 receive a score floor:
    • 0–24h: floor = 0.30
    • 24–48h: floor = 0.15
    • >48h: no floor
  • Prevents new high-quality articles from being buried by established articles with usage_score > 0

Tests

29 new tests in tests/core/test_ranking_improvements.py covering all cases including combined scenarios.

Checklist

  • Backward-compatible (new params have defaults)
  • All 1666 existing tests pass
  • ruff format + lint clean

Copilot AI review requested due to automatic review settings February 28, 2026 18:09
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR implements two ranking improvements to multi_signal_rank in ranking.py: (1) epistemic type awareness that boosts/penalises articles based on whether their epistemic_type matches the detected query intent, and (2) a cold-start floor that prevents fresh high-confidence articles from being buried behind established ones. A new detect_query_intent function classifies queries as procedural, episodic, or general.

Changes:

  • ranking.py: Added detect_query_intent, _cold_start_floor, and updated multi_signal_rank with query_intent and cold_start_boost parameters
  • retrieval.py: Wired detect_query_intent into the main _retrieve_sync retrieval path
  • tests/core/test_ranking_improvements.py: 29 new tests covering all new behaviours

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/valence/core/ranking.py Adds epistemic type detection, cold-start floor helper, and extends multi_signal_rank with two new optional parameters
src/valence/core/retrieval.py Integrates detect_query_intent into the main retrieval path and passes query_intent to multi_signal_rank
tests/core/test_ranking_improvements.py New test file with 29 tests covering detect_query_intent, epistemic ranking, cold-start boost, and combined scenarios
Comments suppressed due to low confidence (1)

src/valence/core/ranking.py:337

  • The score_breakdown in explain mode stores the final_score (which includes the epistemic multiplier and/or cold-start floor) as "final", but each signal's "contribution" still reflects only the pre-adjustment weighted score. When a query intent match multiplies final_score by 1.3 (or 0.85), or the cold-start floor raises it, the existing invariant sum(contributions) == final is violated. The existing test in tests/core/test_ranking.py (line 196–197) and tests/cli/test_cli.py (line 303–304) happen to pass only because the test articles are created with days_ago=0 and confidence_overall=0.7 (so the cold-start floor doesn't change anything) and have no epistemic_type. Any call to multi_signal_rank(explain=True, query_intent="procedural") on a procedural article will produce a score_breakdown where sum(contributions) != final. Consider adding epistemic_multiplier and cold_start_floor fields to the breakdown dict so callers can inspect and audit the full scoring.
        if explain:
            r["score_breakdown"] = {
                "semantic": {
                    "value": semantic,
                    "weight": semantic_weight,
                    "contribution": semantic_weight * semantic,
                },
                "confidence": {
                    "value": confidence,
                    "weight": confidence_weight,
                    "contribution": confidence_weight * confidence,
                },
                "recency": {
                    "value": recency,
                    "weight": recency_weight,
                    "contribution": recency_weight * recency,
                },
                "final": final_score,
            }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 548 to 560
for r in all_results:
freshness_days = r.get("freshness", 90.0)
r["_original_created_at"] = r.get("created_at")
r["created_at"] = (now - timedelta(days=freshness_days)).isoformat()

query_intent = detect_query_intent(query)
ranked = multi_signal_rank(
all_results,
semantic_weight=0.50,
confidence_weight=0.35,
recency_weight=0.15,
query_intent=query_intent,
)
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cold-start boost is silently broken in _retrieve_sync. Before calling multi_signal_rank, the code replaces each article's created_at with a fake date derived from freshness_days (line 551). Since _cold_start_floor reads created_at to determine article age, it will evaluate the synthetic date rather than the actual creation timestamp. A newly created article with freshness = 90.0 (the default for unknown age) would have its created_at shifted to 90 days ago, making it ineligible for the cold-start floor entirely.

The fix is to store the real created_at separately before the substitution (e.g. as _original_created_at, which already happens) and pass the real creation time into _cold_start_floor, or alternatively apply the cold-start evaluation after restoring created_at rather than during multi_signal_rank. As written, the cold-start boost works only when multi_signal_rank is called directly (e.g. in tests), but not via the main retrieve() code path.

Copilot uses AI. Check for mistakes.
def test_general_default(self):
assert detect_query_intent("python async patterns") == "general"

def test_general_empty(self):
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test name test_general_empty is misleading — the query string "database indexing strategies" is not empty. The test is actually checking that a generic, non-procedural and non-episodic query returns "general". A more accurate name would be test_general_generic_query or test_general_non_matching_query.

Suggested change
def test_general_empty(self):
def test_general_generic_query(self):

Copilot uses AI. Check for mistakes.
Comment on lines 240 to 247
decay_rate: float = 0.01,
min_confidence: float | None = None,
explain: bool = False,
query_intent: str | None = None,
cold_start_boost: bool = True,
) -> list[dict]:
"""Apply multi-signal ranking to query results.

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The multi_signal_rank docstring on line 248 still documents the formula as only the three-signal weighted sum, but the function now also conditionally applies an epistemic type multiplier (×1.3 or ×0.85) and a cold-start score floor. The docstring should be updated to describe the extended formula including these adjustments.

Copilot uses AI. Check for mistakes.
Comment on lines +10 to +11
import pytest

Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The pytest module is imported on line 10 but never used in the test file — no pytest.mark, pytest.raises, pytest.fixture, or similar constructs are present. This unused import should be removed.

Suggested change
import pytest

Copilot uses AI. Check for mistakes.
"what happened",
"when did",
)
_EPISODIC_CONTAINS = ("session", "last time")
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The term "session" in _EPISODIC_CONTAINS is a broad substring match that will produce false positives for legitimate procedural/semantic queries such as "configure session timeout", "manage user sessions", or "HTTP session handling". These have nothing to do with episodic (time-anchored) recollections. Consider using a more specific phrase like "last session" or "previous session" instead of the bare word "session".

Suggested change
_EPISODIC_CONTAINS = ("session", "last time")
_EPISODIC_CONTAINS = ("last session", "previous session", "last time")

Copilot uses AI. Check for mistakes.
Comment on lines +46 to +48
# ---------------------------------------------------------------------------
# Query intent detection (#70)
# ---------------------------------------------------------------------------
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR title and description reference issues #70 and #71, but the actual issues #70 and #71 are about "Implement propagate() API with restriction composition" and "Add strip_on_forward field redaction" respectively — both related to privacy primitives. They have no connection to epistemic type ranking or cold-start boost features being implemented here. The issue references are therefore incorrect.

Copilot uses AI. Check for mistakes.
Comment on lines +105 to +134
_COLD_START_WINDOW_HOURS = 48
_COLD_START_FLOOR_FULL = 0.3 # floor at creation -> 24h
_COLD_START_FLOOR_HALF = 0.15 # floor at 24h -> 48h
_COLD_START_MIN_CONFIDENCE = 0.7


def _cold_start_floor(article: dict, confidence: float) -> float | None:
"""Return the cold-start score floor for a qualifying article, or None."""
if confidence < _COLD_START_MIN_CONFIDENCE:
return None

created_at = article.get("created_at")
if created_at is None:
return None

if isinstance(created_at, str):
try:
created_at = datetime.fromisoformat(created_at.replace("Z", "+00:00"))
except (ValueError, TypeError):
return None

if isinstance(created_at, datetime):
if created_at.tzinfo is None:
created_at = created_at.replace(tzinfo=UTC)
now = datetime.now(UTC)
age_hours = (now - created_at).total_seconds() / 3600.0
if age_hours <= 24:
return _COLD_START_FLOOR_FULL
elif age_hours <= 48:
return _COLD_START_FLOOR_HALF
Copy link

Copilot AI Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The constant _COLD_START_WINDOW_HOURS = 48 is defined but never referenced in _cold_start_floor. The 24 and 48 hour thresholds are hardcoded as magic numbers directly in the function body. _COLD_START_WINDOW_HOURS should be used in the elif age_hours <= _COLD_START_WINDOW_HOURS comparison, and a companion _COLD_START_MID_WINDOW_HOURS = 24 constant (or similar) should guard the first branch — consistent with how _COLD_START_FLOOR_FULL, _COLD_START_FLOOR_HALF, and _COLD_START_MIN_CONFIDENCE are used.

Copilot uses AI. Check for mistakes.
@zonk1024 zonk1024 force-pushed the ranking-improvements branch from 8b11a7a to 9b88c14 Compare February 28, 2026 18:41
- Add detect_query_intent() to classify queries as procedural/episodic/general
- Apply 1.3x boost when article epistemic_type matches query intent
- Apply 0.85x penalty when procedural↔episodic conflict
- Add cold_start_boost: fresh articles (≤48h, confidence≥0.7) get score floor
  - 0–24h: floor = 0.30; 24–48h: floor = 0.15; >48h: no floor
- Wire detect_query_intent into retrieval.py retrieve path
- 29 tests covering all cases
@zonk1024 zonk1024 force-pushed the ranking-improvements branch from 9b88c14 to 6281fdf Compare February 28, 2026 18:42
@zonk1024 zonk1024 merged commit d03546c into main Feb 28, 2026
3 checks passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants