Skip to content

zane-perry/NN-Music-Recommendation

Repository files navigation

Audio Multimodal Tagging & Preference Modeling

This repository contains code for an academic project that extracts audio features (MFCC, STFT, DWT), converts human responses to vectorized labels, and trains neural models (CNN / CNN→Transformer / multimodal attention) to predict listener yes/no song attributes and favorites. The code was written and run as part of a class project and uses a saved tensor dataset (data_tensors.pth) for model training and evaluation.

IMPORTANT: the audio files used to produce data_tensors.pth are not included in this repository due to copyright and file size limits. See the "Preparing audio data" section below for how to supply your own audio files so you can reproduce preprocessing and training.

At-a-glance

  • Primary tasks implemented

    • Convert survey responses CSV (responses.csv) into structured JSON (output.json) — in analysis.py.
    • Extract MFCC / STFT / Wavelet (DWT) features from audio and build normalized tensors — in dataSets.py.
    • Train and evaluate neural models using the saved tensors (data_tensors.pth) — in mfcc_rnn.py, multimodal.py, and related scripts.
    • Visualize example features (visualizeData.py).
  • Example artifacts generated by the pipeline

    • output.json — JSON array produced from responses.csv (survey → labels).
    • data_tensors.pth — a PyTorch file containing preprocessed tensors for MFCC, STFT, DWT and train/test labels.
    • Figures printed/shown by visualizeData.py (MFCC / DWT / STFT examples).

Repository structure (key files)

  • analysis.py — CSV → JSON conversion and vector encoding logic used to create data.json / output.json from responses.csv.
  • dataSets.py — feature extraction (MFCC, STFT, DWT), normalization, and conversion to PyTorch tensors; saves data_tensors.pth.
  • mfcc.py, mfcc_rnn.py — model architectures and training loops that use the MFCC tensors.
  • multimodal.py — multimodal model combining DWT, STFT, and MFCC branches + attention and training loop.
  • visualizeData.py — quick plotting script to inspect MFCC / DWT / STFT tensors from data_tensors.pth.
  • dwt.py, stft.py — helper/feature code (if present) referenced by feature extraction functions.
  • data.json — mapping of inputs (song identifiers) to label vectors (used by dataSets.py).
  • dummy_clustered_data.json — extra data used as augmentation/extra training examples in dataSets.py.
  • responses.csv — raw survey CSV used to build output.json (included in repo).
  • data_tensors.pth — saved tensors (already present in this repo) so training/evaluation can run without rerunning feature extraction.

Requirements

  • Python 3.8–3.11 recommended (tested on macOS).
  • The project uses the following Python packages (approx):
    • torch (PyTorch)
    • torchaudio (optional; librosa used here)
    • librosa
    • numpy
    • scipy
    • scikit-learn
    • matplotlib
    • pywt (PyWavelets)
    • pandas

You can install the common dependencies with pip:

python3 -m pip install torch librosa numpy scipy scikit-learn matplotlib pywt pandas

If you have a CUDA-enabled GPU and a compatible PyTorch build, install torch according to PyTorch's instructions for your CUDA version.

Preparing audio data (important)

The feature extraction code in dataSets.py expects audio files (MP3) to be available in ../Input Songs/ relative to dataSets.py. The functions call librosa.load('../Input Songs/' + song_path + '.mp3', ...) so the project expects a directory structure like:

<project-root>/Code/dataSets.py
<project-root>/Input Songs/<song-id>.mp3
  • The song_path values come from data.json / output.json. Ensure the input value in data.json matches the filename (without the .mp3 extension).
  • The audio files used when this project was run are not included here because they are copyrighted and large.

If you want to reproduce the preprocessing and training steps:

  1. Create Input Songs folder at the repository root (or adjust dataSets.py to point at your audio path).
  2. Place MP3 files named exactly as the input fields in data.json (e.g., MySong.mp3).
  3. Optionally, open dataSets.py and confirm max_pad_len or other parameters to suit your audio durations.

Note: If you don't have the original audio files, a quick way to exercise the training code is to use the included data_tensors.pth (already in the repo). That file contains precomputed tensors and labels so you can run training and visualization without audio files.

Quick run guide (minimal reproducible steps)

  1. (Optional) Convert survey CSV to JSON (if you edited responses.csv):
python3 analysis.py
# This runs `csv_to_json('responses.csv', 'output.json')` by default.
  1. (Optional / To regenerate feature tensors from audio):
  • Add your MP3 files to ../Input Songs/ (see previous section), make sure data.json lists those inputs.
  • Run feature extraction to create data_tensors.pth:
python3 dataSets.py
# This script will read `data.json` and `dummy_clustered_data.json`, extract MFCC/DWT/STFT, normalize and save `data_tensors.pth`.
  1. Train or evaluate models (using pre-saved tensors):
  • Train MFCC-based model (example):
python3 mfcc_rnn.py
# Uses ./data_tensors.pth to load X_train_mfcc, X_test_mfcc, y_train, y_test
  • Train the multimodal model (DWT + STFT + MFCC):
python3 multimodal.py
# Uses ./data_tensors.pth to load multi-modal tensors and trains the MultiModalAttentionModel
  • Run the simpler CNN training example (if present):
python3 mfcc.py
  1. Visualize example features
python3 visualizeData.py
# This loads ./data_tensors.pth and shows example MFCC / DWT / STFT plots.

Expected outputs and how to verify

  • data_tensors.pth — after running dataSets.py, check that this file exists and contains keys like X_train_mfcc, X_test_mfcc, X_train_dwt, X_train_stft, y_train, y_test, etc.
  • Training scripts will print epoch-by-epoch loss and final Hamming accuracy (or similar metric). Look for printed lines like:
Epoch [x/y], Loss: 0.xxx
Test Hamming Accuracy: 0.zzzz
  • visualizeData.py will open Matplotlib figures showing MFCC / DWT / STFT examples.
  • analysis.py will write output.json when run on responses.csv.

Notes, tips & caveats

  • The dataSets.py code currently uses ../Input Songs/ + <song>.mp3. If your audio files are elsewhere, either move them or edit dataSets.py accordingly.
  • The code uses librosa for audio loading and feature extraction; different librosa versions may produce slightly different results. If reproducibility matters, pin librosa to the version you used.
  • For quick experimentation you can skip extraction and use the included data_tensors.pth file.
  • The model code is intentionally compact and experimental (educational project). It contains TODOs and places where hyperparameters / regularization can be tuned.

Troubleshooting

  • If you see errors loading data_tensors.pth, confirm you are in the repository root and the file path ./data_tensors.pth is correct.
  • If librosa raises an error loading MP3 files, ensure ffmpeg or audioread backends are available on your system (install ffmpeg via Homebrew on macOS: brew install ffmpeg).
  • If GPU/CPU errors occur when loading/saving tensors, confirm torch version and device availability. The code uses CPU tensors by default but will run on GPU if tensors and models are moved to CUDA (not done by default in these scripts).

Small reproducibility checklist (one-liner)

  • Install dependencies
  • Place your MP3s in Input Songs/ (or use data_tensors.pth provided)
  • Run python3 analysis.py if you changed responses.csv
  • Run python3 dataSets.py to generate data_tensors.pth (skip if using the provided data_tensors.pth)
  • Train models: python3 mfcc_rnn.py or python3 multimodal.py
  • Visualize with python3 visualizeData.py

Contact & license

This code is part of a class project. If you have questions about reproducing experiments or the input data, please open an issue or contact the project author (add contact details here).


Note: this README was generated to document the current codebase. The included data_tensors.pth allows running and testing models without the original audio files (which are not included for copyright and size reasons).

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages