Firstly install pixi, then run:
pixi installRun to create / update pixi on cluster:
python update_pixi.py --config-path configs --config-name tiny_remoteRequirements:
- Config file must specify
infrastructure.server(target cluster) infrastructure.slurm.scriptmust containexport PIXI_HOME=...line- Only affects the cluster specified in the config file
What it does:
- Copies local
pixi.tomlandpixi.lockto remote cluster - Runs
pixi installon compute node via SLURM (GPU params auto-removed) - Archives old pixi files before installing new environment
pixi shell
python main.py --config-path configs --config-name tiny_localpython run_exp.py --config-path configs --config-name tiny_remoteNote: run_exp.py does not copy pixi files (pixi.toml, pixi.lock) to the cluster to avoid inflating memory and file count in $HOME. Use update_pixi.py (see Setup > Remote) to update the pixi environment on the cluster first.
Uses Hydra for configuration management. Classes are instantiated via _target_:
trainer:
train_dataloader:
_target_: src.core.datasets.get_dataloader
dataset_path: /path/to/data
sequence_length: 2048- Why state of model, optim, scheduler is separated from other state parameters?
- We want to start metric_logger ASAP, loading model's distributed checkpoint forces us to create model before loading weights.
- How to load llama weights? Set following fields in a config:
trainer:
checkpoint:
load:
type: huggingface
path: "meta-llama/Llama-3.2-1B"
n_steps: 0