Skip to content

texttechnologylab/SimpleActionGestureModel

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🖐️ Hands-On Example: Action Gesture Classification

Overview

This repository provides a hands-on example of classifying hand gestures based on hand pose data extracted from video frames. It was originally developed as part of the ESSLLI 2025 lecture: 👉 https://github.com/aluecking/ESSLLI2025

The example focuses on human–object interaction gestures, demonstrating a lightweight, end-to-end pipeline — from data preparation to live action recognition.

📂 Dataset

A small subset of the Moments in Time dataset is used, limited to the following action classes:

  • cycling
  • running
  • drinking
  • eating

The extracted hand pose data (computed via MMPose) is included in the repository under:

data/results_hands

🧩 Feature Extraction

We compute a set of interpretable, low-dimensional features from hand pose data (see HandPoseFeatureGenerator.py). These features are designed to capture finger extension, pinch behavior, and hand configuration, which help distinguish between actions such as eating and drinking.

Note: The HandPoseFeatureGenerator class was initially generated using Claude.ai for demonstration purposes.

Feature Name Description
thumb_extension Normalized distance from thumb base to tip (0 = curled, 1 = fully extended)
index_extension Normalized distance from index base to tip
middle_extension Normalized distance from middle finger base to tip
ring_extension Normalized distance from ring finger base to tip
pinky_extension Normalized distance from pinky base to tip
fingers_extended_count Number of fingers considered “extended” (extension > 0.5)
avg_finger_extension Average of all finger extension ratios
pinch_distance Euclidean distance between thumb and index fingertips (in pixels)
is_pinching Binary indicator (1 if pinch distance < 30 px, else 0)

🧠 Model

This experiment uses a simple neural classification pipeline based on scikit-learn:

StandardScaler → MLPClassifier
  • StandardScaler Normalizes features to zero mean and unit variance, ensuring equal contribution from all features.

  • MLPClassifier A lightweight feedforward neural network that learns nonlinear relationships between hand pose features and gesture labels.

⚙️ Setup Instructions

Create and activate the environment

conda create -n ubtt python=3.8
conda activate ubtt

Install dependencies

conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 cpuonly -c pytorch

pip install -U openmim
mim install mmengine
pip install "mmcv==2.1.0"
mim install "mmdet==3.2.0"
mim install "mmpose==1.3.2"

pip install -r requirements.txt

📜 Script Descriptions

handy_on.py

Main entry point for the hands-on example. Runs the full pipeline:

  • Prepare the dataset
  • Extract hand pose data
  • Compute pose-based features
  • Train and evaluate a gesture classifier

HandPoseFeatureGenerator.py

Defines the feature extraction class that converts raw hand pose keypoints into numerical descriptors for classification.

SimpleLiveActionClassifier.py

Implements a real-time gesture recognizer using webcam input and the trained model.

📖 Citation

TODO

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published