A machine learning project that trains a CNN to classify environmental sounds from the UrbanSound8K dataset using mel spectrograms. This version uses four classes: siren, dog_bark, drilling, and engine_idling.
The extracted dataset lives under data/UrbanSound8K/: audio is in audio/fold1 … audio/fold10, and metadata is in metadata/UrbanSound8K.csv. The CSV lists every clip with columns such as slice file name, fold, start/end time, and class (e.g. siren, dog_bark, drilling, engine_idling)—so you can see all the data and labels in one place.
Example clip from the dataset: data/UrbanSound8K/audio/fold8/62048-3-0-3.wav.
![]()
UrbanSoundAI/
assets/ # Screenshots for README (drag & drop here)
data/
UrbanSound8K/ # Dataset (audio/fold1..fold10, metadata/UrbanSound8K.csv)
extract_dataset.py # Extract dataset from .tar.gz / .gz download
outputs/
models/ # Saved trained models (.pt)
figures/ # Saved spectrogram plots
src/
__init__.py
config.py # Paths, classes, spectrogram & training settings
dataset.py # Load CSV, filter classes, mel spectrograms, PyTorch Dataset
model.py # CNN definition
train.py # Train CNN, print accuracy, save model
predict.py # Predict class for a .wav file
visualize_sample.py # Display mel spectrogram of a WAV file
requirements.txt
README.md
-
Python: Use Python 3.8 or newer.
-
Dataset (for training):
- If you have the downloaded .gz / .tar.gz archive, extract it into the project:
(Use the path where your download is, e.g.
python extract_dataset.py path/to/UrbanSound8K.tar.gz
C:\Users\You\Downloads\UrbanSound8K.tar.gz.) - Or place an already-extracted dataset under:
UrbanSoundAI/data/UrbanSound8K/- With
audio/fold1…audio/fold10andmetadata/UrbanSound8K.csv.
- If you have the downloaded .gz / .tar.gz archive, extract it into the project:
-
Install dependencies:
cd UrbanSoundAI pip install -r requirements.txt
python train.pyOptional arguments:
--model-name urbansound_cnn.pt– output filename (default:urbansound_cnn.pt)--epochs 25– number of epochs--batch-size 32– batch size
The script loads the CSV, filters to the four classes, builds mel spectrograms, splits into train/test, trains the CNN, prints training progress and final test accuracy, and saves the model to outputs/models/.
python predict.py path/to/audio.wavOptions:
--model path/to/model.pt– path to saved model (default:outputs/models/urbansound_cnn.pt)--show-probs– print probability for each of the four classes
python visualize_sample.py path/to/audio.wavOptions:
--title "My title"– plot title--save path/to/figure.png– save figure (default:outputs/figures/sample_spectrogram.png)--no-show– only save, do not open the plot window
- PyTorch – CNN and training
- librosa – load audio and compute mel spectrograms
- matplotlib – spectrogram visualization
- pandas – read UrbanSound8K metadata CSV
- scikit-learn – train/test split
A small CNN takes mel spectrograms of shape (1, n_mels, time_steps) (128 mel bins, 173 time steps). It uses three conv blocks (with BatchNorm, ReLU, MaxPool, Dropout), global average pooling, and two fully connected layers to output four class logits.
UrbanSound8K has its own license terms; ensure you comply with them when using the dataset.