You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* We added Reward and LLM-as-a-Judge to our task family
6
+
* Reward allows you to write a custom function that scores the prediction, without requiring groundtruth
7
+
* LLM-as-a-Judge allows you to deligate the task of scoring a prediction to a Judge-LLM, optionally accepting groundtruth
8
+
9
+
* Changes to CAPO, to make it applicable to the new tasks:
10
+
* CAPO now accepts input parameter "check_fs_accuracy" (default True) - in case of reward tasks the accuracy cannot be evaluated, so we will take the prediction of the downstream_llm as target of fs.
11
+
* CAPO also accepts "create_fs_reasoning" (default is True): if set to false, just use input-output pairs from df_few_shots
12
+
13
+
* introduces tag-extraction function, to centralize repeated code for extractions like "<final_answer>5</final_answer>"
14
+
15
+
#### Further changes:
16
+
* We now utilize mypy for automated type checking
17
+
* core functionalities of classification task has been moved to base task to prevent code duplication for other tasks
0 commit comments