Skip to content

A lightweight U-Net Mask-Based speech denoising model

Notifications You must be signed in to change notification settings

simonriou/DenoiseNet

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

55 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DenoiseNet: A U-Net Mask-Based Speech Denoising Model

Overview

A demo of the model's performances is available at the docs site.

  • Model: DenoiseNet, a U-Net mask estimator trained on log-magnitude STFT features with multi-objective losses (BCE on IBM, L1 on linear and mel magnitudes, waveform L1).
  • Training corpus: ~2.5 hours of English speech mixed with babble noise at controlled SNR.
  • Inference: streaming-ready; runs in (near) real time on a standard laptop CPU.
  • Results: significant SNR improvements on test set; subjective quality gains observed. Near state-of-the-art performance among lightweight denoising models.
  • Training script: src/training/train.py.
  • Inference script: src/inference/inference.py.
  • Configuration centralised in src/utils/constants.py.

Repository Layout

Environment Setup

  • Recommended: Python 3.10+ with torch, torchaudio, speechbrain, numpy, scipy, tqdm, matplotlib (optional for debug).
  • Example (from repo root):
    • python -m venv .venv && source .venv/bin/activate
    • pip install torch torchaudio numpy scipy tqdm matplotlib

Why Run as Modules from src

  • Imports use package-style paths (e.g., from utils.constants import * in src/training/train.py). Running as a module from inside src ensures Python resolves these packages without manual PYTHONPATH edits.
  • Commands (run from the src directory):
    • Training: python -m training.train
    • Inference: python -m inference.inference
  • If you prefer running from the repo root, set PYTHONPATH=src (e.g., PYTHONPATH=src python -m training.train).

Data Preparation

Configuring Hyperparameters

  • Edit src/utils/constants.py to change:
    • Data paths (ROOT, CLEAN_DIR, NOISE_DIR, test directories).
    • STFT params (N_FFT, HOP_LENGTH, WIN_LENGTH, N_MELS).
    • Training params (EPOCHS, BATCH_SIZE, LEARNING_RATE, loss weights LAMBDA, GAMMA, OMEGA, ZETA, mel weight ALPHA).
    • Phase reconstruction (PHASE_MODE in {raw, GL, vocoder} and GL_ITERS).
    • Logging/output toggles (SAVE_DENOISED, SAVE_NOISY) and model selection (MODEL_NAME).

Training Procedure

Inference Pipeline

  • Working directory: src (module mode).
  • Ensure MODEL_NAME in src/utils/constants.py points to a weight file in data/models (e.g., waveform-3.pth).
  • Ensure test .pt speech files exist in data/test/speech; use the conversion script if starting from .wav.
  • Command: python -m inference.inference
  • Internals (see src/inference/inference.py):
    • Loads SpeechNoiseDataset in test mode (adds filenames), batch size 1 with padding.
    • Predicts mask, reconstructs magnitude; phase via PHASE_MODE (raw uses mixture phase and torch.istft, GL uses Griffin-Lim, vocoder placeholder not implemented).
    • Saves enhanced (and optionally noisy) audio to data/test/enhanced and logs SNR per file to experiments/logs/<MODEL_NAME>/inference_snr_log.csv.
    • Reports per-file and average inference time.

Reproducibility Notes

  • Randomness: validation split uses a fixed seed (42); other loaders follow PyTorch defaults (set torch.manual_seed externally if stricter determinism is required).
  • Data: training uses the provided .pt tensors; ensure any new data follows the same int16 tensor convention and sample rate (SAMPLE_RATE in constants).
  • Hyperparameters and architecture: fully specified in src/utils/constants.py and src/models/DenoiseUNet.py.
  • Artifacts: checkpoints and logs are versioned by session/model names; retain these along with the exact constants.py snapshot to reproduce results.

Troubleshooting

  • Import errors (e.g., No module named utils): run commands from src or set PYTHONPATH=src.
  • Missing data: verify .pt files in data/train/speech and data/train/noise; for test, populate data/test/raw and reconvert.
  • Phase mode errors: PHASE_MODE='vocoder' is not implemented; use raw or GL.

Citing

If you build on this work, please cite the repository and describe DenoiseNet as “a U-Net mask-based speech denoising model trained with combined BCE, linear/mel L1, and waveform losses.”

About

A lightweight U-Net Mask-Based speech denoising model

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages