CALAMITA Evaluation

This repository contains scripts and utilities to run and reproduce the CALAMITA results.

Getting started

Create an isolated python environment and install the submodule lm-eval-harness. This repository is a fork of the official lm-eval-harness that supports neat additional functionalities, e.g., running tasks on local datasets, cleaning up VRAM after generation to make room for eval code that uses LLMs, specify custom aggregation functions. Then, install the requirements listed under requirements.txt.

pip install -e './lm-eval-harness[vllm]'
pip install -r requirements.txt

(optional) Prepare Files Locally

If your computing environment does not have internet access, use the two scripts bash/download_datasets.sh and bash/download_models.sh to download task data and models locally.

Important

Note that some CALAMITA 2024 tasks require private data not accessible online. For full reproducibility, get in touch with the task authors and request for those files.

Running a model on a task

Select a list of subtasks and put them into a txt file, one per line, e.g., in a tasks.txt (you can find an example file in the root directory).
Schedule the job through SLURM, e.g.,

sbatch ./bash/run_model meta-llama/Llama-3.1-70B-Instruct tasks.txt

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
bash		bash
img		img
lm-evaluation-harness @ 4c643ec		lm-evaluation-harness @ 4c643ec
scripts		scripts
tasks		tasks
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt
tasks.txt		tasks.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CALAMITA Evaluation

Getting started

(optional) Prepare Files Locally

Running a model on a task

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CALAMITA Evaluation

Getting started

(optional) Prepare Files Locally

Running a model on a task

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages