This repository contains the implementation of a gesture classification pipeline based on segment-based frame sampling and pretrained ResNeXt-101 feature extraction, followed by training lightweight MLP classifiers.
The work was done as part of a Computer Vision assignment at Leiden University.
For a detailed explanation of the methodology, experiments, and results, please see the report.pdf.
- Task: Hand gesture classification using the Jester dataset.
- Approach:
- Videos are divided into temporal segments.
- A single frame is sampled from each segment.
- Features are extracted using a ResNeXt-101 CNN pretrained on ImageNet.
- Features are fed into a Multi-Layer Perceptron (MLP) for classification.
- Key Findings:
- Smaller MLPs perform better than larger ones.
- Equidistant (first frame) sampling outperforms random uniform sampling.
- Reducing segments from 8 → 4 halves training time with minimal accuracy loss.
Six models were trained, varying in MLP architecture, frame sampling, and number of segments.
| Model | Params (MLP) | Frame Selection | Segments | Test Accuracy | Test Loss |
|---|---|---|---|---|---|
| 1 | 75.5M | Random (uniform) | 8 | 0.499 | 1.708 |
| 2 | 257.3M | Random (uniform) | 8 | 0.456 | 1.783 |
| 3 | 8.4M | Random (uniform) | 8 | 0.531 | 1.742 |
| 4 | 8.4M | First frame | 8 | 0.546 | 1.796 |
| 5 | 4.2M | First frame | 4 | 0.537 | 1.714 |
| 6 | 2.1M | First frame | 2 | 0.433 | 1.995 |
Plots of training time, accuracy, and loss can be found in the plots directory.
helpers_scripts/split_train_val.py– Script to split dataset into training and validation setstest.ipynb– Notebook for evaluating trained models
logs/train_model1.log– Training log for Model 1train_model2.log– Training log for Model 2train_model3.log– Training log for Model 3train_model4.log– Training log for Model 4train_model5.log– Training log for Model 5train_model6.log– Training log for Model 6
metrics/model_1/accuracies.pkl– Accuracy values across traininglosses.pkl– Loss values across training
model_2/…model_6/– Same structure asmodel_1/test_acc.pkl– Final test accuraciestest_loss.pkl– Final test lossestime_taken.pkl– Training times (approx., extracted from logs)
models_training_code/model1.py– Training script for Model 1model2.py...model6.py– Training scripts for Models 2-6model_1.ipynb– Notebook version of Model 1 trainingmodel_2.ipynb…model_6.ipynb– Notebooks for Models 2–6
plots/Plots.ipynb– Notebook to generate result plotsaccuracies_plot.png– Accuracy curveslosses_plot.png– Loss curvestime_bar.png– Training time comparison
splits/jester-v1-labels.csv– Gesture class labelsjester-v1-train.csv– Original training split (v1)jester-v1-validation.csv– Original validation split (v1)train.csv– Training setval.csv– Validation settest.csv– Test set
report.pdf– Full project reportREADME.md– Project documentation (this file)
- Python 3.8+
- PyTorch
- torchvision
- pandas, numpy, matplotlib
- scikit-learn
Install dependencies:
pip install -r requirements.txt