Conversation
| assert num_fewshot == 0, "Fewshot is not supported for BigCodeBench" | ||
| # Only the base BigCodeBench class disallows fewshot; subclasses (e.g. BigCodeBench_OLMES) may use it. | ||
| if self.__class__ is BigCodeBench and num_fewshot != 0: | ||
| raise ValueError("Fewshot is not supported for BigCodeBench; use BigCodeBench_OLMES for 3-shot.") |
There was a problem hiding this comment.
Why should this be an error?
There was a problem hiding this comment.
Changed the raise ValueError to a logger.warning that logs the requested value and resets to num_fewshot=0, which is the existing implementation of our BigCodeBench task. But adding this here to avoid user confusion since, oppositely, BigCodeBench_OLMES only runs with num_fewshot=0.
| def _get_fewshot_target_text(self, item: dict[str, Any]) -> str: | ||
| # Match oe_eval doc_to_target for complete: canonical_solution + "\\n```" | ||
| target = item["canonical_solution"] | ||
| assert target is not None and isinstance(target, str) |
There was a problem hiding this comment.
Ideally, raise a ValueError as asserts can be turned off globally
There was a problem hiding this comment.
Replaced this with an explicit if not isinstance(target, str): raise ValueError(...).
|
|
||
| test_code = r""" | ||
| import unittest | ||
| class TestCases(unittest.TestCase): |
There was a problem hiding this comment.
Would you be able to rename these and have some description of what they are actually testing? I'm not sure why these tests uses unittest while the rest of the repo uses pytest.
There was a problem hiding this comment.
Renamed these test methods to be a bit more descriptive added docstrings explaining that the unittest code in the test data strings reflects BigCodeBench's format, not our repo's test framework.
Those string gets passed to execute_python_code_with_tests(), which sends it to a Docker container where it's written to a file and run as a separate Python process. The import unittest happens inside the container's Python interpreter, not in the repo test runner's process.
…r BigCodeBench_OLMES task
PR Checklist
/docs/).What type of PR is this? (check all applicable)
Description
Adds a variant of the BigCodeBench task which mimics the OLMES implementation.
Added/updated tests?
have not been included