Test: Assert specific evaluation scores for sample data#1249
Test: Assert specific evaluation scores for sample data#1249musaqlain wants to merge 2 commits intoweecology:mainfrom
Conversation
|
Thanks for the contribution. I would prefer a "benchmark" unit test that specifies the model via config as explicitly as possible (for example using a commit hash from hugging face, not just revision= We already have this assertion in the unit test in the PR: which we could bump to Please could you also include the AI assistance declaration from the PR template? (you can see an example here) |
|
Thanks for the guidance! I completely understand your point that separating the sanity checks from strict benchmarking is a better approach for long-term maintenance of this package. Now I have updated it accordingly also updated this PR's description. 👍 |
Description
Previously,
test_evaluateonly checked if the execution completed without errors, but did not assert the quality of the results. This PR adds assertions to ensure the model produces consistent precision and recall scores on the sample data.Changes Made
Testing
pytest tests/test_main.py::test_evaluatelocally.Linked Issue
Closes #1233
AI-Assisted Development
I utilized AI for sanity checking code logic and conducting an assisted review to identify potential missing resets and schema constraints.