Distilling Dataset into Neural Field (DDiF) [ICLR 2025]

This repository contains an official PyTorch implementation for the paper Distilling Dataset into Neural Field in ICLR 2025.

Donghyeok Shin, HeeSun Bae, Gyuwon Sim, Wanmo Kang, and Il-Chul Moon

Overview

Abstract Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel.

Getting Started

Create a new virtual environment and install the required dependencies using the requirements.txt file:

pip install -r requirements.txt

Usage

DDiF adopts SIREN as the default synthetic neural field. The main hyperparameters of DDiF are as follows:

dim_in : Input dimension (n)
num_layers : Number of layers in the neural field (L)
layer_size : Width of layers in the neural field (d)
dim_out : Output dimension (m)
w0_initial : Scaling parameter for the first layer in SIREN
w0 : Scaling parameter for subsequent layers in SIREN
lr_nf : Learning rate for the neural field
epochs_init : Epochs for warm-up training
lr_nf_init : Learning rate for warm-up training

Detailed values for these hyperparameters can be found in our paper or hyper_params.py. For other hyperparameters, we follow the default setting of each dataset distillation objectives. Please refer to the provided bash scripts for detailed arguments when running experiments.

Image domain

Run the following command with appropriate distillation loss DC/DM/TM:
For TM, please run run_buffer.sh to generate expert trajectories before distillation.

cd {DISTILLATION_LOSS}/scripts
bash run_DDiF.sh

Video domain

We built upon the video distillation code.
Please prepare UCF101 dataset.
Run the following command:

cd Video/scripts
bash run_DDiF.sh

3D Voxel domain

Please prepare ModelNet and ShapeNet datasets.
Run the following command:

cd 3D_Voxel/scripts
bash run_DDiF#{DISTILLATION_LOSS}.sh

Citation

If you find the code useful for your research, please consider citing our paper.

@inproceedings{shin2025distilling,
    title={Distilling Dataset into Neural Field},
    author={Donghyeok Shin and HeeSun Bae and Gyuwon Sim and Wanmo Kang and Il-chul Moon},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=nCrJD7qPJN}
}

This work is heavily built upon the code from

Dataset condensation with gradient matching, Paper, Code
Dataset condensation with distribution matching, Paper, Code
Dataset distillation by matching training trajectories, Paper, Code
Frequency Domain-based Dataset Distillation, Paper, Code
Pytorch implementation of SIREN, Code
Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement, Paper, Code

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distilling Dataset into Neural Field (DDiF) [ICLR 2025]

Overview

Getting Started

Usage

Image domain

Video domain

3D Voxel domain

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
3D_Voxel		3D_Voxel
DC		DC
DM		DM
SynSet		SynSet
TM		TM
Video		Video
README.md		README.md
overview_DDiF.png		overview_DDiF.png
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Distilling Dataset into Neural Field (DDiF) [ICLR 2025]

Overview

Getting Started

Usage

Image domain

Video domain

3D Voxel domain

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages