-
Notifications
You must be signed in to change notification settings - Fork 4
Pull requests: Aleph-Alpha-Research/eval-framework
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
feat: Add the OLMES variant of the MBPP task
#186
opened Feb 26, 2026 by
tfburns
Loading…
6 of 12 tasks
feat: add OLMES variant of HumanEval task
#185
opened Feb 26, 2026 by
tfburns
Loading…
7 of 12 tasks
fix: OLMES matching effort (MC Task Suite)
#182
opened Feb 24, 2026 by
fsschneider
Loading…
7 of 12 tasks
docs: Polishing the docs (changelog, _IDK variants, adding new benchmarks)
#178
opened Feb 11, 2026 by
fsschneider
Loading…
5 of 9 tasks
chore(main): release 0.2.13
autorelease: pending
#172
opened Feb 4, 2026 by
github-actions
bot
Loading…
Update citation year and add version+author to README
#159
opened Jan 26, 2026 by
tfburns
Loading…
1 task done
chore: Bump pyasn1 from 0.6.1 to 0.6.2 in the uv group across 1 directory
dependencies
Pull requests that update a dependency file
python:uv
Pull requests that update python:uv code
#157
opened Jan 16, 2026 by
dependabot
bot
Loading…
feat: add safe_metric_calculation decorator for LLM judge error handling
#154
opened Jan 13, 2026 by
AhmedHammam-AA
Loading…
fix(main): duplicated task that are actually the same
#144
opened Jan 7, 2026 by
benureau
Loading…
3 of 13 tasks
fix(wmt): use HuggingFace datasets instead of sacrebleu
#137
opened Dec 19, 2025 by
AhmedHammam-AA
Loading…
Remove leading space in ground truth formatting
#129
opened Dec 10, 2025 by
SohirMaskey
Loading…
3 of 13 tasks
ProTip!
Mix and match filters to narrow down what you’re looking for.