Skip to content

Early stopping via score budget for candidate scoring#1096

Open
pudo wants to merge 8 commits intomainfrom
pudo/scoring-patience-cutoff
Open

Early stopping via score budget for candidate scoring#1096
pudo wants to merge 8 commits intomainfrom
pudo/scoring-patience-cutoff

Conversation

@pudo
Copy link
Copy Markdown
Member

@pudo pudo commented Apr 7, 2026

Summary

  • Add early stopping to score_results() using a score budget: each candidate costs 1 token, and its score earns back score / (threshold/2) tokens. When the budget is exhausted, stop scoring.
  • One env var: YENTE_SCORE_EARLY_STOP_BUDGET (default 10, set high to disable)
  • All other values derived from the per-request threshold parameter
  • On production data (418 queries): saves ~27% of scoring calls, misses 3 results (0.7%), all sub-threshold

The formula:

budget = budget - 1 + score / (threshold / 2)
stop when budget <= 0 and rank >= limit

A score of threshold/2 (0.35 at default) breaks even. Higher scores extend the search; lower scores drain the budget. This naturally adapts to query quality — queries with real matches keep searching proportionally longer.

Refs #1011

Test plan

  • Verify existing tests pass
  • Compare /match output with and without early stopping on representative queries
  • Monitor miss rate in production logs

🤖 Generated with Claude Code

pudo and others added 2 commits April 7, 2026 19:46
Stop scoring candidates after consecutive low-scoring results, with
adaptive patience that increases when a promising score is seen.
Thresholds derived from the per-request threshold parameter.

Refs #1011

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replace the multi-step patience/counter/flag logic with a single
equation: budget = budget - 1 + score / (threshold/2). A score of
threshold/2 breaks even; higher scores extend the search, lower
scores drain the budget. Stops when budget is exhausted.

Simpler (one accumulator, one equation, one setting) and better
on production data: 27% savings with 3 missed results vs 6 with
the patience approach, because the budget responds proportionally
to score quality rather than using a binary boost flag.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pudo pudo changed the title Early stopping heuristic for candidate scoring Early stopping via score budget for candidate scoring Apr 8, 2026
@pudo pudo requested a review from Copilot April 8, 2026 15:10
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces an early-stopping heuristic in the candidate scoring pipeline to reduce CPU spent scoring low-quality Elasticsearch candidates, controlled via a score “budget” and a new environment variable.

Changes:

  • Add YENTE_SCORE_EARLY_STOP_BUDGET setting to control early-stopping aggressiveness.
  • Implement budget-based early stopping in score_results() based on per-candidate score vs. request threshold.
  • Add a design/analysis document describing the motivation, data, and heuristic options.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.

File Description
yente/settings.py Adds SCORE_EARLY_STOP_BUDGET env-configured setting for early stopping.
yente/scoring.py Implements budget-based early stop logic inside score_results().
plans/scoring-early-stopping.md Documents research, rationale, and heuristic design notes for reducing scoring calls.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 30 to 43
async def score_results(
algorithm: Type[ScoringAlgorithm],
entity: Entity,
results: Iterable[Tuple[Entity, float]],
threshold: float = settings.SCORE_THRESHOLD,
cutoff: float = 0.0,
limit: Optional[int] = None,
limit: int = settings.MATCH_PAGE,
config: ScoringConfig = ScoringConfig.defaults(),
) -> Tuple[int, List[ScoredEntityResponse]]:
scored: List[ScoredEntityResponse] = []
matches = 0
budget = float(settings.SCORE_EARLY_STOP_BUDGET)
tau = threshold * EARLY_STOP_BREAK_EVEN
for rank, (result, index_score) in enumerate(results):
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early stopping changes core matching behavior and is hard to validate via the existing endpoint-level tests alone. Please add focused unit tests around score_results() (e.g., a stub algorithm returning a known score sequence) to assert (1) scoring stops early when budget is exhausted, and (2) top results are still returned/sorted correctly at different cutoff/threshold/limit combinations.

Copilot uses AI. Check for mistakes.
pudo and others added 5 commits April 8, 2026 17:40
…toff

The continue on cutoff filtering was skipping the budget <= 0 check,
so early stopping would almost never trigger when cutoff was set.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Disable early stopping (budget=inf) when tau would be non-positive,
avoiding ZeroDivisionError on score/tau.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
limit is now typed as int, so the None guard was dead code.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
rank is 0-based, so rank >= limit required limit+1 candidates.
Use rank + 1 >= limit to match the intended "at least limit" semantics.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@pudo pudo requested a review from leonhandreke April 8, 2026 15:52
@leonhandreke
Copy link
Copy Markdown
Contributor

Looks very good. The only thought I had was that YENTE_SCORE_EARLY_STOP_BUDGET could be limit * 2 or something? Right now, it's a bit of an environment-level constant of how many results we actually expect to make sense IIUC. If limit is higher and the budget is exhausted, I understand we only continue until we've filled up our limit results and then exit.

But yeah, that's the only original thought I had and I'm not really convinced it's a good one. But wanted to throw it out there.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants