Early stopping via score budget for candidate scoring#1096
Early stopping via score budget for candidate scoring#1096
Conversation
Stop scoring candidates after consecutive low-scoring results, with adaptive patience that increases when a promising score is seen. Thresholds derived from the per-request threshold parameter. Refs #1011 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the multi-step patience/counter/flag logic with a single equation: budget = budget - 1 + score / (threshold/2). A score of threshold/2 breaks even; higher scores extend the search, lower scores drain the budget. Stops when budget is exhausted. Simpler (one accumulator, one equation, one setting) and better on production data: 27% savings with 3 missed results vs 6 with the patience approach, because the budget responds proportionally to score quality rather than using a binary boost flag. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR introduces an early-stopping heuristic in the candidate scoring pipeline to reduce CPU spent scoring low-quality Elasticsearch candidates, controlled via a score “budget” and a new environment variable.
Changes:
- Add
YENTE_SCORE_EARLY_STOP_BUDGETsetting to control early-stopping aggressiveness. - Implement budget-based early stopping in
score_results()based on per-candidate score vs. request threshold. - Add a design/analysis document describing the motivation, data, and heuristic options.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
yente/settings.py |
Adds SCORE_EARLY_STOP_BUDGET env-configured setting for early stopping. |
yente/scoring.py |
Implements budget-based early stop logic inside score_results(). |
plans/scoring-early-stopping.md |
Documents research, rationale, and heuristic design notes for reducing scoring calls. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| async def score_results( | ||
| algorithm: Type[ScoringAlgorithm], | ||
| entity: Entity, | ||
| results: Iterable[Tuple[Entity, float]], | ||
| threshold: float = settings.SCORE_THRESHOLD, | ||
| cutoff: float = 0.0, | ||
| limit: Optional[int] = None, | ||
| limit: int = settings.MATCH_PAGE, | ||
| config: ScoringConfig = ScoringConfig.defaults(), | ||
| ) -> Tuple[int, List[ScoredEntityResponse]]: | ||
| scored: List[ScoredEntityResponse] = [] | ||
| matches = 0 | ||
| budget = float(settings.SCORE_EARLY_STOP_BUDGET) | ||
| tau = threshold * EARLY_STOP_BREAK_EVEN | ||
| for rank, (result, index_score) in enumerate(results): |
There was a problem hiding this comment.
Early stopping changes core matching behavior and is hard to validate via the existing endpoint-level tests alone. Please add focused unit tests around score_results() (e.g., a stub algorithm returning a known score sequence) to assert (1) scoring stops early when budget is exhausted, and (2) top results are still returned/sorted correctly at different cutoff/threshold/limit combinations.
…toff The continue on cutoff filtering was skipping the budget <= 0 check, so early stopping would almost never trigger when cutoff was set. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Disable early stopping (budget=inf) when tau would be non-positive, avoiding ZeroDivisionError on score/tau. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
limit is now typed as int, so the None guard was dead code. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rank is 0-based, so rank >= limit required limit+1 candidates. Use rank + 1 >= limit to match the intended "at least limit" semantics. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Looks very good. The only thought I had was that But yeah, that's the only original thought I had and I'm not really convinced it's a good one. But wanted to throw it out there. |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
score_results()using a score budget: each candidate costs 1 token, and its score earns backscore / (threshold/2)tokens. When the budget is exhausted, stop scoring.YENTE_SCORE_EARLY_STOP_BUDGET(default 10, set high to disable)thresholdparameterThe formula:
A score of
threshold/2(0.35 at default) breaks even. Higher scores extend the search; lower scores drain the budget. This naturally adapts to query quality — queries with real matches keep searching proportionally longer.Refs #1011
Test plan
/matchoutput with and without early stopping on representative queries🤖 Generated with Claude Code