Instrument Classification using the Multinomial Naive Bayes Method

A full theoretical explanation of of the project can be found here.

Overview

This repository implements a lightweight bag-of-audio pipeline for identifying musical instruments. Raw WAV clips are converted to MFCC frames, vector-quantized into a codebook (k-means), collapsed into bag-of-audio histograms, and finally scored with a Laplace-smoothed multinomial Naive Bayes classifier. The defaults target the provided metadata files (Metadata_Train.csv, Metadata_Test.csv) whose rows follow filename,class.

Installation

Create and activate a virtual environment (Python 3.10+ works; examples below use 3.13).

python3.13 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip

Install runtime dependencies.
```
python -m pip install -r requirements.txt
```
The pipeline relies on librosa, numpy, pandas, and scikit-learn. Librosa may require system packages such as libsndfile or ffmpeg, depending on your OS.

Data Layout

This code will fit any dataset with whatever instruments there are provided in the metadata. I have used this data set in my experiments.

The scripts assume the following structure (feel free to adjust paths via CLI flags or Make variables):

data/
  Raw/
    Train/                # WAV files referenced by Metadata_Train.csv
    Test_submission/      # WAV files referenced by Metadata_Test.csv (or a submission set)
    Metadata_Train.csv    # filename,class header
    Metadata_Test.csv
  Processed/
    Features/             # MFCC numpy arrays (*.npy)
    Features_hist/        # Bag-of-audio histograms
models/
  meta.json               # Aggregated hyperparameters
  codebook.npz            # K-means codebook + normalization stats
  nb_counts.npz           # Trained NB counts and priors

Example metadata row:

filename,class
Sound_Guitar_0001.wav,Sound_Guitar

Training Pipeline

The easiest path is to let make orchestrate every stage (defaults mirror the paths shown above):

make pipeline                      # runs features -> codebook -> histograms -> nb
make evaluate                      # trains if needed, then scores Metadata_Test.csv

To run the steps manually:

Extract MFCC features

python src/extract_features.py --data_root data/Raw/Train --metadata data/Raw/Metadata_Train.csv --out_dir data/Processed/Features

Fit the vector-quantizer (codebook)

python src/vq.py --feat_dir data/Processed/Features --model_dir models --k 128 --max_frames_per_file 1000

Convert features to bag-of-audio histograms

python src/quantize.py --feature_dir data/Processed/Features --models_dir models --out_dir data/Processed/Features_hist --metric euclidean

Train the multinomial Naive Bayes classifier

python src/train_nb.py --metadata data/Raw/Metadata_Train.csv --hists_dir data/Processed/Features_hist --models_dir models --alpha 5.0

Each script exposes additional CLI arguments (-h) for fine-tuning frame limits, distance metrics, alpha, and file locations.

Using the Models

Classify a single WAV
```
python src/predict.py --wav path/to/file.wav --models_dir models --topk 4
```
The script prints the predicted instrument plus the top-k candidates with log-scores and normalized probabilities.
Evaluate a labeled set
```
python src/predict.py --metadata data/Raw/Metadata_Test.csv --data_root data/Raw/Test_submission --models_dir models --dump_preds reports/test_preds.csv
```
In evaluation mode you receive accuracy, confusion matrix, per-class precision/recall, and (optionally) a CSV of per-file predictions.

Outputs and Metadata

data/Processed/Features/*.npy: MFCC frames per recording, saved as (num_frames, n_mfcc).
data/Processed/Features_hist/*.npy: Histogram counts of codebook assignments for each file.
models/codebook.npz: Contains the codebook, feature dimension, and normalization statistics (mean, std, k).
models/nb_counts.npz: Stores per-class counts, overall counts, and learned priors for inference.
models/meta.json: Consolidates MFCC parameters, VQ settings, label names, and Naive Bayes hyperparameters; the prediction script reads this file to stay in sync with the training configuration.

Results

Using the Euclidean Metric, with the preset values:

accuracy: 0.8750
confusion (rows=true, cols=pred):
                  Guitar       Drum      Piano     Violin
      Guitar          17          0          3          0
        Drum           1         19          0          0
       Piano           2          1         16          1
      Violin           2          0          0         18

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
docs		docs
src		src
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Instrument Classification using the Multinomial Naive Bayes Method

Overview

Installation

Data Layout

Training Pipeline

Using the Models

Outputs and Metadata

Results

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Instrument Classification using the Multinomial Naive Bayes Method

Overview

Installation

Data Layout

Training Pipeline

Using the Models

Outputs and Metadata

Results

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages