This repository contains model training and baselines for CellARC:
- Dataset repo: https://github.com/mireklzicar/cellarc
- Website: https://cellarc.mireklzicar.com/
| Project | Size | Training Mode | W&B URL |
|---|---|---|---|
cellarc100k_50e_embedding_small |
small | embedding | https://wandb.ai/lzicar/cellarc100k_50e_embedding_small |
cellarc100k_50e_embedding_medium |
medium | embedding | https://wandb.ai/lzicar/cellarc100k_50e_embedding_medium |
cellarc100k_50e_embedding_large |
large | embedding | https://wandb.ai/lzicar/cellarc100k_50e_embedding_large |
cellarc100k_50e_incontext_small |
small | incontext | https://wandb.ai/lzicar/cellarc100k_50e_incontext_small |
cellarc100k_50e_incontext_medium |
medium | incontext | https://wandb.ai/lzicar/cellarc100k_50e_incontext_medium |
cellarc100k_50e_incontext_large |
large | incontext | https://wandb.ai/lzicar/cellarc100k_50e_incontext_large |
- Use
python scripts/train.py --config-name train/defaultwith Hydra overrides for architecture/size/mode. - Example (tiny_recursive large embedding with W&B logging):
python scripts/train.py \ --config-name train/default \ model.architecture=tiny_recursive \ model/size=large \ training.mode=embedding \ trainer.checkpoints.enabled=true \ logging.wandb.enabled=true \ logging.wandb.project=cellarc100k_50e_embedding_large \ logging.wandb.group=mode_embedding \ logging.wandb.name=tiny_recursive_large_embedding_single
- Smoke test: swap to the lightweight
train/smokeconfig for a 5-step sanity check before long runs:python scripts/train.py \ --config-name train/smoke \ model.architecture=tiny_recursive \ model/size=small \ training.mode=incontext \ trainer.checkpoints.enabled=false \ logging.wandb.enabled=false
scripts/train_all_tmux.shsplits independent runs across GPUs via tmux workers.- Launch a curated subset on GPUs 0–3:
bash scripts/train_all_tmux.sh --gpus 0,1,2,3 \ --run transformer_act:large:embedding \ --run tiny_recursive:large:embedding \ --run hrm:large:embedding \ --run transformer:large:incontext
- Status lives under
outputs/tmux_runs/<timestamp>; attach to sessions withtmux attach -t train_all_gpu0. - Smoke test:
bash scripts/train_all_smoke_test.sh smallruns every architecture in both modes with thetrain/smokeconfig (5 optimizer steps) before you queue the larger tmux batch; passsmall mediumto limit sizes.
- For data-parallel training of one embedding run across 4 GPUs use
torchrun(rank 0 handles logging).PYTORCH_CUDA_ALLOC_CONF=expandable_segments:true \ torchrun --nproc-per-node=4 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 \ -m scripts.train --config-name train/default \ model.architecture=tiny_recursive \ model/size=large \ training.mode=embedding \ data.batch_size=96 \ trainer.gradient_accumulation=2 \ trainer.checkpoints.enabled=true \ logging.wandb.enabled=true \ logging.wandb.project=cellarc100k_50e_embedding_large \ logging.wandb.group=embedding_ddp \ logging.wandb.name=tiny_recursive_large_embedding_ddp
- Per-rank batch plus gradient accumulation controls memory footprint; set
wandb loginonce before running. - Smoke test: keep the same
torchruninvocation but point attrain/smokeso the job exits after a handful of steps:PYTORCH_CUDA_ALLOC_CONF=expandable_segments:true \ torchrun --nproc-per-node=2 --rdzv_backend=c10d --rdzv_endpoint=localhost:0 \ -m scripts.train --config-name train/smoke \ model.architecture=tiny_recursive \ model/size=small \ training.mode=embedding \ logging.wandb.enabled=false \ trainer.checkpoints.enabled=false
- Run all bundled symbolic solvers with a single command; results land in
outputs/symbolic/:bash scripts/run_symbolic_baselines.sh
- To smoke-test or target one solver, call the Hydra entry point directly and bound the episode count:
python scripts/eval_symbolic.py \ baseline.name=copycat \ eval.max_episodes=16 \ eval.progress_bar=true
- Evaluate GPT-based baselines (defaults to
gpt-5-2025-08-07) over the 100-episode HF splits; ensure your OpenAI credentials are exported before running:bash scripts/run_gpt_eval.sh
- Smoke test: limit to 10 episodes and write to a
_smokeprediction log by toggling the environment flag:SMOKE_TEST=true bash scripts/run_gpt_eval.sh
- Batch multiple hosted LLMs via Hydra overrides (or reuse
scripts/run_llm_baselines.sh) when you need to sweep model names.