This repository contains an official PyTorch implementation for the paper Distilling Dataset into Neural Field in ICLR 2025.
Donghyeok Shin, HeeSun Bae, Gyuwon Sim, Wanmo Kang, and Il-Chul Moon
Abstract Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel.
Create a new virtual environment and install the required dependencies using the requirements.txt file:
pip install -r requirements.txt
DDiF adopts SIREN as the default synthetic neural field. The main hyperparameters of DDiF are as follows:
dim_in: Input dimension (n)num_layers: Number of layers in the neural field (L)layer_size: Width of layers in the neural field (d)dim_out: Output dimension (m)w0_initial: Scaling parameter for the first layer in SIRENw0: Scaling parameter for subsequent layers in SIRENlr_nf: Learning rate for the neural fieldepochs_init: Epochs for warm-up traininglr_nf_init: Learning rate for warm-up training
Detailed values for these hyperparameters can be found in our paper or hyper_params.py.
For other hyperparameters, we follow the default setting of each dataset distillation objectives.
Please refer to the provided bash scripts for detailed arguments when running experiments.
- Run the following command with appropriate distillation loss
DC/DM/TM: - For
TM, please runrun_buffer.shto generate expert trajectories before distillation.
cd {DISTILLATION_LOSS}/scripts
bash run_DDiF.sh
- We built upon the video distillation code.
- Please prepare UCF101 dataset.
- Run the following command:
cd Video/scripts
bash run_DDiF.sh
cd 3D_Voxel/scripts
bash run_DDiF#{DISTILLATION_LOSS}.sh
If you find the code useful for your research, please consider citing our paper.
@inproceedings{shin2025distilling,
title={Distilling Dataset into Neural Field},
author={Donghyeok Shin and HeeSun Bae and Gyuwon Sim and Wanmo Kang and Il-chul Moon},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=nCrJD7qPJN}
}This work is heavily built upon the code from
- Dataset condensation with gradient matching, Paper, Code
- Dataset condensation with distribution matching, Paper, Code
- Dataset distillation by matching training trajectories, Paper, Code
- Frequency Domain-based Dataset Distillation, Paper, Code
- Pytorch implementation of SIREN, Code
- Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement, Paper, Code
