This repository contains code for EDM2SE, a diffusion-based speech enhancement model with a magnitude-preserving network architecture.
This code accompanies the paper:
If you use this code or build upon it in your work, please see the Citation section.
The codebase is adapted from the official EDM2 implementation: https://github.com/NVlabs/edm2
We recommend using a dedicated conda environment.
conda create --name edm2se python=3.10
conda activate edm2se
pip install -r requirements.txtDownload the checkpoint for EDM2SE trained on VoiceBank-DEMAND (VB-DMD):
python generate.py \
--net /path/to/checkpoint.ckpt \
--test_dir /path/to/noisy_dir \
--out_dir /path/to/enhanced_dirArguments:
--netPath to the pretrained EDM2SE checkpoint--test_dirDirectory containing noisy input WAV files--out_dirOutput directory for enhanced WAV files
To compute PESQ and SI-SDR between enhanced and clean reference signals:
python calculate_metrics.py \
--proc_dir /path/to/enhanced_wavs \
--target_dir /path/to/clean_wavs \
--results_dir /path/to/results \
--name run_nameArguments:
--proc_dirDirectory containing enhanced WAV files--target_dirDirectory containing clean reference WAV files--results_dirOutput directory for CSV metric files--nameRun identifier used for naming result files
The script saves a CSV file and prints mean scores to the terminal.
- 01/31/2026 --- Training code release
- 01/26/2026 --- Initial inference code release
The training code trains EDM2SE according to the standard recipe described in the paper.
Training can be launched using distributed data parallelism. For example training EDM2SE using 2 GPUs:
torchrun --standalone --nproc_per_node=2 train.py \
--run_dir=/path/to/run_dir \
--data=/path/to/datasetArguments:
--nproc_per_node=2Number of processes to launch per node (usually equal to the number of GPUs).--run_dirDirectory where training outputs (checkpoints, logs, and configuration files) are saved.--dataPath to the training dataset directory.
To compute sigma_x and sigma_n for the model, run:
python training/dataset_stats.py \
--data /path/to/data_dirMake sure the project root is added to PYTHONPATH before running the script.
If you use this code or pretrained model in your work, please cite our ICASSP 2026 paper:
Richter, Julius, Danilo De Oliveira, and Timo Gerkmann. "Do We Need EMA for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture." IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2026.
@inproceedings{richter2026edm2se,
title={Do We Need {EMA} for Diffusion-Based Speech Enhancement? Toward a Magnitude-Preserving Network Architecture},
author={Richter, Julius and de Oliveira, Danilo and Gerkmann, Timo},
booktitle={IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)},
year={2026}
}