Conversation
Compares logic-v1 vs logic-v2 match quality against known-good fixture
CSVs by calling a running yente instance over HTTP. Two scripts:
- generate_report_data.py: batches fixture records through /match/{dataset}
and writes raw {score, match} results per combo to a JSON file
- render_report_data.py: reads the JSON, computes stats, and renders an
HTML report via Jinja2 + markdown-it-py with GitHub Markdown CSS
Fixtures copied from candidate_generation_benchmark (truncated to 1000
records each). Adds markdown-it-py to dev deps.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
@jbothma Friedrich asked me to vibe something up for the linked issue in operations. Could you take a look at the example report to see if this makes sense to you? Should we put this in operations instead of the |
jbothma
left a comment
There was a problem hiding this comment.
I think all of these thoughts are bonus - this is already very informative, but I think we can be more precise and reproducible. Feel free to break out into a new ticket or tell me why it's not applicable:
- at some point the fixtures will be inconsistent with what's loaded in yente, right?
- let's document how to update the fixtures, or how to load precisely the version of data that corresponds with the fixtures
- Can this capture the versions of the relevant datasets actually loaded in the yente under test?
- it'd be nice if the yente readme documented how to run this
It's nice that this is in the yente repo so people can reproduce it.
| {% endfor %} | ||
| ## Environment | ||
|
|
||
| **yente:** `{{ yente_git_version }}` |
There was a problem hiding this comment.
| **yente:** `{{ yente_git_version }}` | |
| **yente git hash:** `{{ yente_git_version }}` |
There was a problem hiding this comment.
It will be a tag if we're testing a clean tag, so "git version" will be better.
| if hits: | ||
| top = hits[0] | ||
| results.append({"score": top["score"], "match": top["match"]}) | ||
| else: | ||
| results.append({"score": 0.0, "match": False}) |
There was a problem hiding this comment.
should the positives include entity IDs and confirm that precisely the entity we're expecting was one of the results considered a match?
If we're merely considering some "match" as a match, maybe that could be a note in the report somewhere under "methodology" or something so that code doesn't have to be read to discover that.
Adds
contrib/validation_report/— a small tool to validate match quality against known-good fixture CSVs by calling a running yente instance over HTTP (rather than importing internals). Tests logic-v1 vs logic-v2 side-by-side.Two scripts:
generate_report_data.py— POSTs fixture records to/match/{dataset}in batches and writes raw{score, match}results to JSONrender_report_data.py— reads the JSON, computes stats (mean top score, matches, %) and renders an HTML report via Jinja2 + markdown-it-py with GitHub Markdown CSSFixtures are copied from
candidate_generation_benchmark(truncated to 1000 records each).Example: report.html
Related: https://github.com/opensanctions/operations/issues/2203
🤖 Generated with Claude Code