UrbanSoundAI

A machine learning project that trains a CNN to classify environmental sounds from the UrbanSound8K dataset using mel spectrograms. This version uses four classes: siren, dog_bark, drilling, and engine_idling.

Dataset

The extracted dataset lives under data/UrbanSound8K/: audio is in audio/fold1 … audio/fold10, and metadata is in metadata/UrbanSound8K.csv. The CSV lists every clip with columns such as slice file name, fold, start/end time, and class (e.g. siren, dog_bark, drilling, engine_idling)—so you can see all the data and labels in one place.

Screenshots

Sample audio: 62048-3-0-3 (fold 8)

Example clip from the dataset: data/UrbanSound8K/audio/fold8/62048-3-0-3.wav.

Matplotlib spectrogram output

Project structure

UrbanSoundAI/
    assets/                   # Screenshots for README (drag & drop here)
    data/
        UrbanSound8K/          # Dataset (audio/fold1..fold10, metadata/UrbanSound8K.csv)
    extract_dataset.py        # Extract dataset from .tar.gz / .gz download
    outputs/
        models/                # Saved trained models (.pt)
        figures/               # Saved spectrogram plots
    src/
        __init__.py
        config.py              # Paths, classes, spectrogram & training settings
        dataset.py             # Load CSV, filter classes, mel spectrograms, PyTorch Dataset
        model.py               # CNN definition
    train.py                   # Train CNN, print accuracy, save model
    predict.py                 # Predict class for a .wav file
    visualize_sample.py        # Display mel spectrogram of a WAV file
    requirements.txt
    README.md

Setup

Python: Use Python 3.8 or newer.
Dataset (for training):
- If you have the downloaded .gz / .tar.gz archive, extract it into the project:
```
python extract_dataset.py path/to/UrbanSound8K.tar.gz
```
  (Use the path where your download is, e.g. C:\Users\You\Downloads\UrbanSound8K.tar.gz.)
- Or place an already-extracted dataset under:
  - UrbanSoundAI/data/UrbanSound8K/
  - With audio/fold1 … audio/fold10 and metadata/UrbanSound8K.csv.

Install dependencies:

cd UrbanSoundAI
pip install -r requirements.txt

Usage

Train the model

python train.py

Optional arguments:

--model-name urbansound_cnn.pt – output filename (default: urbansound_cnn.pt)
--epochs 25 – number of epochs
--batch-size 32 – batch size

The script loads the CSV, filters to the four classes, builds mel spectrograms, splits into train/test, trains the CNN, prints training progress and final test accuracy, and saves the model to outputs/models/.

Predict on a WAV file

python predict.py path/to/audio.wav

Options:

--model path/to/model.pt – path to saved model (default: outputs/models/urbansound_cnn.pt)
--show-probs – print probability for each of the four classes

Visualize a spectrogram

python visualize_sample.py path/to/audio.wav

Options:

--title "My title" – plot title
--save path/to/figure.png – save figure (default: outputs/figures/sample_spectrogram.png)
--no-show – only save, do not open the plot window

Tech stack

PyTorch – CNN and training
librosa – load audio and compute mel spectrograms
matplotlib – spectrogram visualization
pandas – read UrbanSound8K metadata CSV
scikit-learn – train/test split

Model

A small CNN takes mel spectrograms of shape (1, n_mels, time_steps) (128 mel bins, 173 time steps). It uses three conv blocks (with BatchNorm, ReLU, MaxPool, Dropout), global average pooling, and two fully connected layers to output four class logits.

License

UrbanSound8K has its own license terms; ensure you comply with them when using the dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UrbanSoundAI

Dataset

Screenshots

Sample audio: 62048-3-0-3 (fold 8)

Matplotlib spectrogram output

Project structure

Setup

Usage

Train the model

Predict on a WAV file

Visualize a spectrogram

Tech stack

Model

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
assets		assets
outputs		outputs
src		src
.gitignore		.gitignore
README.md		README.md
extract_dataset.py		extract_dataset.py
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py
visualize_sample.py		visualize_sample.py

Folders and files

Latest commit

History

Repository files navigation

UrbanSoundAI

Dataset

Screenshots

Sample audio: 62048-3-0-3 (fold 8)

Matplotlib spectrogram output

Project structure

Setup

Usage

Train the model

Predict on a WAV file

Visualize a spectrogram

Tech stack

Model

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages