Passive-to-active dense latent distillation of CanViT (arXiv:2603.22570) from DINOv3 (arXiv:2508.10104).
Originally designed to run on the Nibi SLURM cluster using its hosted ImageNet-21k winter21_whole replica.
cp .envrc.example .envrc && direnv allow
# Edit .envrc to adapt to your environment.Please ensure that HF_TOKEN, COMET_API_KEY, and COMET_WORKSPACE are set.
Export DINOv3 teacher features once:
uv run python scripts/build_shuffled_index.py \
--image-root $IN21K_IMAGE_DIR --index-dir $INDEX_DIR --dataset in21k
sbatch --array=0-99%20 slurm/export_features.shPretraining:
sbatch slurm/train.sbatch [--flag value ...]Ablations:
bash slurm/ablations/baseline.sh
bash slurm/ablations/no-bptt.sh
# ...@article{berreby2026canvit,
title={CanViT: Toward Active-Vision Foundation Models},
author={Berreby, Yoha{\"i}-Eliel and Du, Sabrina and Durand, Audrey and Krishna, B. Suresh},
year={2026},
eprint={2603.22570},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2603.22570}
}MIT. See LICENSE for details.