Skip to content

aailab-kaist/DDiF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Distilling Dataset into Neural Field (DDiF) [ICLR 2025]

This repository contains an official PyTorch implementation for the paper Distilling Dataset into Neural Field in ICLR 2025.

Donghyeok Shin, HeeSun Bae, Gyuwon Sim, Wanmo Kang, and Il-Chul Moon

Overview

Teaser image

Abstract Utilizing a large-scale dataset is essential for training high-performance deep learning models, but it also comes with substantial computation and storage costs. To overcome these challenges, dataset distillation has emerged as a promising solution by compressing the large-scale dataset into a smaller synthetic dataset that retains the essential information needed for training. This paper proposes a novel parameterization framework for dataset distillation, coined Distilling Dataset into Neural Field (DDiF), which leverages the neural field to store the necessary information of the large-scale dataset. Due to the unique nature of the neural field, which takes coordinates as input and output quantity, DDiF effectively preserves the information and easily generates various shapes of data. We theoretically confirm that DDiF exhibits greater expressiveness than some previous literature when the utilized budget for a single synthetic instance is the same. Through extensive experiments, we demonstrate that DDiF achieves superior performance on several benchmark datasets, extending beyond the image domain to include video, audio, and 3D voxel.

Getting Started

Create a new virtual environment and install the required dependencies using the requirements.txt file:

pip install -r requirements.txt

Usage

DDiF adopts SIREN as the default synthetic neural field. The main hyperparameters of DDiF are as follows:

  • dim_in : Input dimension (n)
  • num_layers : Number of layers in the neural field (L)
  • layer_size : Width of layers in the neural field (d)
  • dim_out : Output dimension (m)
  • w0_initial : Scaling parameter for the first layer in SIREN
  • w0 : Scaling parameter for subsequent layers in SIREN
  • lr_nf : Learning rate for the neural field
  • epochs_init : Epochs for warm-up training
  • lr_nf_init : Learning rate for warm-up training

Detailed values for these hyperparameters can be found in our paper or hyper_params.py. For other hyperparameters, we follow the default setting of each dataset distillation objectives. Please refer to the provided bash scripts for detailed arguments when running experiments.

Image domain

  • Run the following command with appropriate distillation loss DC/DM/TM:
  • For TM, please run run_buffer.sh to generate expert trajectories before distillation.
cd {DISTILLATION_LOSS}/scripts
bash run_DDiF.sh

Video domain

cd Video/scripts
bash run_DDiF.sh

3D Voxel domain

cd 3D_Voxel/scripts
bash run_DDiF#{DISTILLATION_LOSS}.sh

Citation

If you find the code useful for your research, please consider citing our paper.

@inproceedings{shin2025distilling,
    title={Distilling Dataset into Neural Field},
    author={Donghyeok Shin and HeeSun Bae and Gyuwon Sim and Wanmo Kang and Il-chul Moon},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=nCrJD7qPJN}
}

This work is heavily built upon the code from

  • Dataset condensation with gradient matching, Paper, Code
  • Dataset condensation with distribution matching, Paper, Code
  • Dataset distillation by matching training trajectories, Paper, Code
  • Frequency Domain-based Dataset Distillation, Paper, Code
  • Pytorch implementation of SIREN, Code
  • Dancing with Still Images: Video Distillation via Static-Dynamic Disentanglement, Paper, Code

About

Official PyTorch implementation for Distilling Dataset into Neural Field [ICLR 2025]

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors