RACE Fine-Tuner

Fine-tunes a model that answers multiple-choice questions in the RACE reading comprehension dataset using OpenPipe.

Consists of three main files:

finetuner.py: API call for finetuning llama-8b on the RACE dataset using OpenPipe. There are three pre-done training sets, all taken from RACE:
- dataset_id_tiny: 10 train + 2 test, picked 12 entries by hand
- dataset_id_small: 878 train + 50 test, uniform sample ~1% of data
- dataset_id_all: all 92.8k entries
I have generated these on OpenPipe. If you want access to the datasets on OpenPipe, message me - I'd have to add you to the project in OpenPipe. (Can't add them to the repo - too large!)
tester.py: Tests accuracy of an OpenPipe model on the RACE dataset. To test functionality of the tester, chooses 100 random entries from the testing dataset, checks if model matches with answer.
race/openai_formatter.py: Formats entries in the RACE dataset as OpenAI chat completion objects. The resulting output file can be dropped directly into OpenPipe as a dataset.

To run finetuner and tester, set an environment variable OPENPIPE_API_KEY with your openpipe key, otherwise it'll ask you for one.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
race		race
.gitignore		.gitignore
README.md		README.md
finetune.py		finetune.py
tester.py		tester.py

Provide feedback