Skip to content

Segment-based gesture classification using ResNeXt-101 features and lightweight MLPs, trained on the Jester dataset. This project explores frame sampling strategies, MLP architectures, and segment counts to balance accuracy with computational efficiency. Includes training code, evaluation notebooks, logs, metrics, and full experimental report.

Notifications You must be signed in to change notification settings

Reymer249/Computer-Vision_Gesture-Classification

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gesture Classification using ResNeXt-101 Features

This repository contains the implementation of a gesture classification pipeline based on segment-based frame sampling and pretrained ResNeXt-101 feature extraction, followed by training lightweight MLP classifiers.
The work was done as part of a Computer Vision assignment at Leiden University.

For a detailed explanation of the methodology, experiments, and results, please see the report.pdf.


📖 Project Overview

  • Task: Hand gesture classification using the Jester dataset.
  • Approach:
    • Videos are divided into temporal segments.
    • A single frame is sampled from each segment.
    • Features are extracted using a ResNeXt-101 CNN pretrained on ImageNet.
    • Features are fed into a Multi-Layer Perceptron (MLP) for classification.
  • Key Findings:
    • Smaller MLPs perform better than larger ones.
    • Equidistant (first frame) sampling outperforms random uniform sampling.
    • Reducing segments from 8 → 4 halves training time with minimal accuracy loss.

🧪 Models & Experiments

Six models were trained, varying in MLP architecture, frame sampling, and number of segments.

Model Params (MLP) Frame Selection Segments Test Accuracy Test Loss
1 75.5M Random (uniform) 8 0.499 1.708
2 257.3M Random (uniform) 8 0.456 1.783
3 8.4M Random (uniform) 8 0.531 1.742
4 8.4M First frame 8 0.546 1.796
5 4.2M First frame 4 0.537 1.714
6 2.1M First frame 2 0.433 1.995

Plots of training time, accuracy, and loss can be found in the plots directory.


📂 Repository Structure

  • helpers_scripts/
    • split_train_val.py – Script to split dataset into training and validation sets
    • test.ipynb – Notebook for evaluating trained models
  • logs/
    • train_model1.log – Training log for Model 1
    • train_model2.log – Training log for Model 2
    • train_model3.log – Training log for Model 3
    • train_model4.log – Training log for Model 4
    • train_model5.log – Training log for Model 5
    • train_model6.log – Training log for Model 6
  • metrics/
    • model_1/
      • accuracies.pkl – Accuracy values across training
      • losses.pkl – Loss values across training
    • model_2/model_6/ – Same structure as model_1/
    • test_acc.pkl – Final test accuracies
    • test_loss.pkl – Final test losses
    • time_taken.pkl – Training times (approx., extracted from logs)
  • models_training_code/
    • model1.py – Training script for Model 1
    • model2.py ... model6.py – Training scripts for Models 2-6
    • model_1.ipynb – Notebook version of Model 1 training
    • model_2.ipynbmodel_6.ipynb – Notebooks for Models 2–6
  • plots/
    • Plots.ipynb – Notebook to generate result plots
    • accuracies_plot.png – Accuracy curves
    • losses_plot.png – Loss curves
    • time_bar.png – Training time comparison
  • splits/
    • jester-v1-labels.csv – Gesture class labels
    • jester-v1-train.csv – Original training split (v1)
    • jester-v1-validation.csv – Original validation split (v1)
    • train.csv – Training set
    • val.csv – Validation set
    • test.csv – Test set
  • report.pdf – Full project report
  • README.md – Project documentation (this file)

⚙️ Requirements

  • Python 3.8+
  • PyTorch
  • torchvision
  • pandas, numpy, matplotlib
  • scikit-learn

Install dependencies:

pip install -r requirements.txt

About

Segment-based gesture classification using ResNeXt-101 features and lightweight MLPs, trained on the Jester dataset. This project explores frame sampling strategies, MLP architectures, and segment counts to balance accuracy with computational efficiency. Includes training code, evaluation notebooks, logs, metrics, and full experimental report.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published