DiffusionBlocks (ICLR 2026)

We propose DiffusionBlocks, a principled framework that partitions transformers into independently trainable blocks, reducing memory requirements proportionally while maintaining competitive performance across diverse architectures and tasks.

This is an official implementation of DiffusionBlocks on image classification using Vision Transformers (ViT).

Installation

Please install uv. Then, run:

# Install dependencies
uv sync

# make sure to login huggingface and wandb
uv run huggingface-cli login
uv run wandb login

We conducted our experiments in the following environment: Python Version 3.12 and CUDA Version 12.2 H100.

Training

The model checkpoints are saved in logs folder.

Baseline (ViT):

uv run main.py train cifar100 --model_type vit

DiffusionBlocks:

uv run main.py train cifar100 --model_type dblock

NOTE: the total epochs in DiffusionBlocks is multiplied by the number of blocks to align the total number of iterations with the baseline as one step in DiffusionBlocks corresponds to training for one block.

Details

In the base setting, we don't reply on techniques such as heavy data augmentation. In case you want to see the performance with heavy data augmentation and learning rate scheduler, run as follows:

Baseline (ViT):

BATCH_SIZE=128
EPOCHS=1000
POSTFIX="-rand-augment"
WARMUP_STEPS=3900
MODEL_TYPE="dblock"
srun uv run main.py train cifar100 \
    --model_type $MODEL_TYPE \
    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \
    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \
    --scheduler_specific_kwargs '{"min_lr": 5e-5}' \
    --add_rand_aug

DiffusionBlocks:

BATCH_SIZE=128
EPOCHS=1000
POSTFIX="-rand-augment"
WARMUP_STEPS=$((3900 * 3)) # 3 indicates the number of blocks
MODEL_TYPE="dblock"
srun uv run main.py train cifar100 \
    --model_type $MODEL_TYPE \
    --batch_size $BATCH_SIZE --num_epochs $EPOCHS --postfix=$POSTFIX \
    --scheduler_type cosine_with_min_lr --num_warmup_steps $WARMUP_STEPS --lr 5e-4 \
    --scheduler_specific_kwargs '{"min_lr": 5e-5}' \
    --add_rand_aug

Evaluation

Baseline (ViT):

CKPT_PATH="logs/path-to-last.ckpt"
uv run main.py test cifar100 --model_type vit --ckpt_path $CKPT

DiffusionBlocks:

CKPT_PATH="logs/path-to-last.ckpt"
uv run main.py test cifar100 --model_type dblock --ckpt_path $CKPT

Acknowledgement

The implementation of Vision Transformer in vit.py is based on HuggingFace Transformers. And, the implementation of EDM is based on Stability-AI/generative-models.
We are grateful for their work.

Citation

To cite our work, please use the following BibTeX:

@inproceedings{shing2026diffusionblocks,
  title     = {DiffusionBlocks: Block-wise Neural Network Training via Diffusion Interpretation},
  author.   = {Makoto Shing and Masanori Koyama and Takuya Akiba},
  booktitle = {The Fourteenth International Conference on Learning Representations},
  year      = {2026},
  url       = {https://openreview.net/forum?id=pwVSmK71cS}
}

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
data.py		data.py
dblock_modules.py		dblock_modules.py
main.py		main.py
model.py		model.py
overview.jpg		overview.jpg
pyproject.toml		pyproject.toml
uv.lock		uv.lock
vit.py		vit.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DiffusionBlocks (ICLR 2026)

Installation

Training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Languages

License

SakanaAI/DiffusionBlocks

Folders and files

Latest commit

History

Repository files navigation

DiffusionBlocks (ICLR 2026)

Installation

Training

Evaluation

Acknowledgement

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages