Skip to content

contrib: add validation_report tool#1082

Open
leonhandreke wants to merge 1 commit intomainfrom
contrib/validation-report
Open

contrib: add validation_report tool#1082
leonhandreke wants to merge 1 commit intomainfrom
contrib/validation-report

Conversation

@leonhandreke
Copy link
Copy Markdown
Contributor

@leonhandreke leonhandreke commented Mar 31, 2026

Adds contrib/validation_report/ — a small tool to validate match quality against known-good fixture CSVs by calling a running yente instance over HTTP (rather than importing internals). Tests logic-v1 vs logic-v2 side-by-side.

Two scripts:

  • generate_report_data.py — POSTs fixture records to /match/{dataset} in batches and writes raw {score, match} results to JSON
  • render_report_data.py — reads the JSON, computes stats (mean top score, matches, %) and renders an HTML report via Jinja2 + markdown-it-py with GitHub Markdown CSS

Fixtures are copied from candidate_generation_benchmark (truncated to 1000 records each).

Example: report.html

Related: https://github.com/opensanctions/operations/issues/2203

🤖 Generated with Claude Code

Compares logic-v1 vs logic-v2 match quality against known-good fixture
CSVs by calling a running yente instance over HTTP. Two scripts:
- generate_report_data.py: batches fixture records through /match/{dataset}
  and writes raw {score, match} results per combo to a JSON file
- render_report_data.py: reads the JSON, computes stats, and renders an
  HTML report via Jinja2 + markdown-it-py with GitHub Markdown CSS

Fixtures copied from candidate_generation_benchmark (truncated to 1000
records each). Adds markdown-it-py to dev deps.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@leonhandreke leonhandreke requested a review from jbothma March 31, 2026 14:52
@leonhandreke leonhandreke marked this pull request as ready for review March 31, 2026 14:54
@leonhandreke
Copy link
Copy Markdown
Contributor Author

@jbothma Friedrich asked me to vibe something up for the linked issue in operations. Could you take a look at the example report to see if this makes sense to you? Should we put this in operations instead of the contrib/ graveyard in yente?

Copy link
Copy Markdown
Contributor

@jbothma jbothma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all of these thoughts are bonus - this is already very informative, but I think we can be more precise and reproducible. Feel free to break out into a new ticket or tell me why it's not applicable:

  • at some point the fixtures will be inconsistent with what's loaded in yente, right?
    • let's document how to update the fixtures, or how to load precisely the version of data that corresponds with the fixtures
    • Can this capture the versions of the relevant datasets actually loaded in the yente under test?
  • it'd be nice if the yente readme documented how to run this

It's nice that this is in the yente repo so people can reproduce it.

{% endfor %}
## Environment

**yente:** `{{ yente_git_version }}`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
**yente:** `{{ yente_git_version }}`
**yente git hash:** `{{ yente_git_version }}`

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be a tag if we're testing a clean tag, so "git version" will be better.

Comment on lines +99 to +103
if hits:
top = hits[0]
results.append({"score": top["score"], "match": top["match"]})
else:
results.append({"score": 0.0, "match": False})
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should the positives include entity IDs and confirm that precisely the entity we're expecting was one of the results considered a match?

If we're merely considering some "match" as a match, maybe that could be a note in the report somewhere under "methodology" or something so that code doesn't have to be read to discover that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants