meditron-apertus

Axolotl configs and Slurm helpers for training/evaluating Apertus-based Meditron models on CSCS.

Prerequisites

CSCS account with access to the storage paths referenced in the configs.
Python environment described by your EDF file (see ENV below).
Clone of the lm-evaluation-harness fork alongside this repo: git clone https://github.com/Xkrilandar/lm-evaluation-harness.

Environment setup

Create a .env in the repo root with your paths and tokens (do not commit secrets):

# Paths
PROJECT_ROOT=/users/<user>/meditron-apertus
STORAGE_ROOT=/capstor/store/cscs/swissai/a127
USER_STORAGE=$STORAGE_ROOT/homes/<user>
ENV=/users/<user>/.edf/apertus.toml

# Auth
WANDB_API_KEY=<wandb_token>
HF_TOKEN=<hf_token>

# Logging
WANDB_PROJECT=<wandb-project>
WANDB_ENTITY=<wandb-entity>

Log in to CSCS and load your environment (quick commands live in setup.md and script_login.bash).

Training

Pick a config in axolotl_config/ (e.g., apertus-8b-only-mediset.yaml, apertus-8b-ablation-no-mediset.yaml, apertus-70b.yaml).
Submit via Slurm (self-submits and tails logs):
```
bash meditron_train.sh axolotl_config/apertus-8b-only-mediset.yaml
```
The script:
- injects your .env values into the template and writes axolotl_config/config.yaml,
- submits itself with sbatch -J <config-name> ...,
- tails reports/R-<job>.<jobid>.err once the log appears.
Adjust SBATCH resources at the top of meditron_train.sh if you need different GPUs/time.

Evaluation

Model parallel run (uses torchrun/parallelize):
```
bash run_eval_model_parallelism.sh
```
Data parallel run (uses accelerate launch):
```
bash run_eval_data_parallelism.sh
```

Both scripts expect:

SLURM_NNODES set by the scheduler,
lm-evaluation-harness installed from your local clone (git clone https://github.com/Xkrilandar/lm-evaluation-harness),
MODEL_PATH pointing to the HF repo or local checkpoint you want to score.

Name		Name	Last commit message	Last commit date
Latest commit History 37 Commits
axolotl_config		axolotl_config
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
meditron_train.sh		meditron_train.sh
run_eval_data_parallelism.sh		run_eval_data_parallelism.sh
run_eval_data_parallelism_debug.sh		run_eval_data_parallelism_debug.sh
run_eval_model_parallelism.sh		run_eval_model_parallelism.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

meditron-apertus

Prerequisites

Environment setup

Training

Evaluation

About

Uh oh!

Releases

Packages

Languages

EPFLiGHT/meditron-apertus

Folders and files

Latest commit

History

Repository files navigation

meditron-apertus

Prerequisites

Environment setup

Training

Evaluation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages