Skip to content

Releases: Aleph-Alpha-Research/eval-framework

v0.2.12

04 Feb 13:13
5983e24

Choose a tag to compare

0.2.12 (2026-02-04)

Features

  • add "top_p" param to AlephAlphaAPIModel (#168) (e52c927)
  • Bump datasets to >=4.0.0 and remove all trust_remote_code references. (#158) (c383806)

v0.2.11

30 Jan 13:30
5244b02

Choose a tag to compare

0.2.11 (2026-01-30)

Bug Fixes

  • Downloaded w&b artifacts are deleted too early (#163) (157d757)
  • use aleph-alpha-client concurrency limit and allow >100 concurrent requests (#166) (73b7d97)
  • VLLM tokenizer lazy initialization didn't work with W&B (#165) (f38de79)

v0.2.10

27 Jan 13:22
abc4aa6

Choose a tag to compare

0.2.10 (2026-01-27)

Bug Fixes

  • prefix dataset paths with hf user id for all tasks that did not have it before (#160) (d5dc178)

v0.2.9

15 Jan 13:28
7456916

Choose a tag to compare

0.2.9 (2026-01-15)

Features

Bug Fixes

  • docker push on release has one too many 'v's in the tag name (#153) (99e6096)

v0.2.8

09 Jan 15:44
c67338c

Choose a tag to compare

0.2.8 (2026-01-09)

Bug Fixes

  • normalize math reasoning (#148) (73a8843)
  • removed github token from release-please and update image links (#147) (74d59ea)

v0.2.7

08 Jan 12:41
635d208

Choose a tag to compare

0.2.7 (2026-01-08)

Features

  • add position randomization for LLM pairwise judges (#135) (e4ed3ec)
  • added automated documentation through CI and Sphinx (#127) (46ef6b3)
  • added badges to github readme to link pypi and docs pages (#139) (778bad2)
  • pass AA_TOKEN and AA_INFERENCE_ENDPOINT in the AA model constructor (#134) (93267b6)

Bug Fixes

  • docs: resolve broken source links (#132) (c0e37b2)
  • release-please pushes docker to registry and triggers tests (#138) (d291bb4)

Documentation

  • added documentation for running tests and expected runtimes (#133) (77fd1d3)

0.2.6

15 Dec 10:55
d7958ba

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.5...v0.2.6

0.2.5

08 Dec 11:57
bfa479c

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.4...v0.2.5

0.2.4

26 Nov 14:46
91888b5

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.3...v0.2.4

0.2.3

14 Nov 09:37
14fff42

Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.2...v0.2.3