Synergistic audio pre-processing and neural architecture design maximizes performance

This repository contains the code for the paper titled "Synergistic audio pre-processing and neural architecture design maximizes performance".

Python Version

Python version: 3.10.12

Installation

To set up the environment, run the following commands:

python3 -m venv .venv
pip install nni torch torchvision torchaudio pytorch_lightning fcwt matplotlib wget

Datasets

The datasets are automatically downloaded when running the run_experiments.py script for the first time on a specific dataset.

Running the baselines

Here, we trained MobileNetv2, MobileNetV3-small and MobileNetV3-large together with a fixed preprocessing (N_FFT= 25ms, HOP_LENGTH= 10ms, N_MELS= 64)

python baselines.py --dataset [speech_commands, vocal_sound, spoken100] --model [mobilenetv2, mobilenetv3small, mobilenetv3large]

Running SOTA baselines

BC-ResNet-8 on Speech Commands

BC-ResNet-8 is a keyword spotting SOTA architecture.

python sota_baselines.py --dataset speech_commands --model bcresnet8

EfficientNet-B0 on VocalSound and SpokeN-100

EfficientNet-B0 is a general-purpose efficient CNN baseline.

python sota_baselines.py --dataset [vocal_sound, spoken100] --model efficientnet-b0

Results are saved to results/baselines/{dataset}/{model}/ with:

model.pth - Best model checkpoint
val_accs.csv - Validation accuracies per epoch
best_val_accuracy.txt - Best validation accuracy
test_accuracy.txt - Final test accuracy
test_accuracies.json - Summary with mean/std across 5 seeds

Running Experiments

To reproduce our results, you can execute the following steps:

OptModel

To run the OptModel experiment, use the following command:

python run_experiment.py --experiment 1 --dataset [speech_commands, vocal_sound, spoken100]

OptPre

To run the OptPre experiment, use the following command:

python run_experiment.py --experiment 2 --dataset [speech_commands, vocal_sound, spoken100] --model [mobilenetv2, mobilenetv3small, mobilenetv3large]

OptBoth

To run the OptBoth experiment, use the following command:

python run_experiment.py --experiment 3 --dataset [speech_commands, vocal_sound, spoken100]

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
dataloader		dataloader
helper		helper
search_spaces		search_spaces
sota_baselines		sota_baselines
.gitignore		.gitignore
README.md		README.md
baselines.py		baselines.py
overview.png		overview.png
run_experiment.py		run_experiment.py
sota_baselines.py		sota_baselines.py
train_best_models_from_scratch.py		train_best_models_from_scratch.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Synergistic audio pre-processing and neural architecture design maximizes performance

Python Version

Installation

Datasets

Running the baselines

Running SOTA baselines

BC-ResNet-8 on Speech Commands

EfficientNet-B0 on VocalSound and SpokeN-100

Running Experiments

OptModel

OptPre

OptBoth

About

Uh oh!

Releases

Packages

Languages

ankilab/synergistic_preprocessing_nas

Folders and files

Latest commit

History

Repository files navigation

Synergistic audio pre-processing and neural architecture design maximizes performance

Python Version

Installation

Datasets

Running the baselines

Running SOTA baselines

BC-ResNet-8 on Speech Commands

EfficientNet-B0 on VocalSound and SpokeN-100

Running Experiments

OptModel

OptPre

OptBoth

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages