Skip to content

ssimonxia/HomE

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition

📄 Paper: IEEE Xplore Link
✍️ Authors: Haochen Xia, Aamir Hasan, Haonan Chen, Katherine Driggs-Campbell

Introduction

We propose HomE, a homogeneous ensemble framework that aims to improve HGR models’ performance and robustness by partitioning gesture classes into smaller, more coherent subsets based on critical features uncovered in parallel by unsupervised clustering and an LLM-driven semantic sampler, and by training a dedicated expert learner for each subset. Then, a separate router learner routes incoming samples to the most relevant expert learner, while the expert routing module fuses the outputs of all expert learners into a final classification. Extensive experiments on the NVGestures, DHG-14, and SHREC’17 datasets show that our method not only enhances accuracy and robustness over single network baselines but also enables these base models to become more competitive with state-of-the-art approaches—all without altering their underlying architectures.

Prerequisites

Package you need:

  • Python 3.9.12
  • Pytorch v1.10.1

Run the command for other dependencies:

pip install -r requirements.txt

Datasets

The three datasets we use are publicly available:

If you cannot open the link for DHG-14 and SHREC’17, you can visit this repo to download the dataset from google drive.

NVGesture

Download the NvGesture dataset and extract the NvGesture directory to ./dataset/Nvidia, it is suggested to make a soft link toward downloaded dataset. Split the dataset using nvidia_dataset_split.py. By default the script will use the rgb data. This will generate point cloud sequences from depth video or extract the rgb videos, and save the processed videos in ./dataset/Nvidia/Processed.

#!/bin/bash
cd dataset
python nvidia_dataset_split.py

If you are processing Depth data you will also have to run the following right after the previous set of commands. Each video generate 32* 512 points, and the generated point clouds occupy about 11G.

#!/bin/bash
python nvidia_process.py

Training

NVGesture

  1. Get class subsets by sampling:
cd NVGesture
python main.py --hypes hyperparameters/NVGestures/train.json --sampling --num_subclasses 3 --subclass_type "clustering" --autoencoder_type "encoding" --clustering_type "agg_hierarch" --resume "models/nv_depth.pth" 

The meaning of the parameters are:

  • hypes: the path of the configuration of hyperparameters for the base model.
  • num_subclasses: the maximum number of classes in each cluster/class subset
  • subclass_type: the type of sampling
  • resume: the path of encoder used to compress input data for clustering (can be the pretrained base model)

Note: the class subsets are printed after check result cluster:, not the final output class.
Note: for LLM-driven semantic sampler, you are free to use the following prompt by replacing the variables with your own:

You are a gesture–taxonomy assistant.

Task
----
Group the gesture classes below into exactly {N_SUBSETS} **non‑overlapping subsets** that share the same {CRITICAL_FEATURE}.
Return your answer **only** as a Python‑style list‑of‑lists of 0‑based class indices, with no extra text.

Rules
1. Classes in the *same* subset must share a clearly common {CRITICAL_FEATURE}.
2. Classes whose {CRITICAL_FEATURE}s differ in an essential way must be in *different* subsets.
3. Balance requirement:   – Let S be the size of the largest subset and s the size of the smallest.  
   – Ensure **S − s ≤ 2** (i.e., subset sizes differ by at most one). 
4. Produce exactly {N_SUBSETS} subsets; every class index must appear in one and only one subset.
5. Preserve input ordering inside each subset (smaller index comes first).
6. Output format example (for illustration only):  
   
[[0, 3, 7], [1, 2, 5], [4, 6]]


Gesture classes (index ▸ name)
------------------------------
{CLASS_LIST}

**Return only the grouped index subsets.**  

Although the LLM‑driven semantic sampler’s output can vary, the classes within each subset will still share common features inferred from their names’ semantic meaning.

  1. Train router learner:
python main.py \
    --hypes hyperparameters/NVGestures/train_decision_maker.json \
    --decision_maker \
    --subclasses "[[1, 19, 5, 7, 15], [0, 2, 3, 4, 6], [8, 9, 10, 11, 12], [13, 14, 16, 17, 18], [20, 21, 22, 23, 24]]" \
    --val_interval 1 \
    --decision_maker_lr 2.5e-5 \
    --decision_maker_scheduler "[100, 175]" \
    --resume "./models/nv_depth.pth" \

The subclasses parameter should be replaced by the class subset you have. It can be either generated by clustering sampling or LLM-driven semantic sampler. You are free to adjust other parameters like learning rate or scheduler step.

  1. Train expert learner:
python main.py \
    --hypes hyperparameters/NVGestures/train_subclassifiers.json \
    --subclassifier \
    --subclassifier_class "[1, 19, 5, 7, 15]" \
    --val_interval 1 \
    --resume "./models/nv_depth.pth" 

The subclassifier_class is one class subset since we only train one expert learner. You are suppose to train expert learners for all class subsets in order to combine them into expert routing modules later.

  1. Run single/multiple expert routing modules:
python main.py --hypes hyperparameters/NVGestures/home.json --random_forest --sampling_result []

The hypes is the configuration for HomE, which should contain all the router learners' and expert learners' paths. hyperparameters/NVGestures/home_sample.json and hyperparameters/NVGestures/home.json are two examples. This script will merge all expert routing modules and give the final prediction accuracy.

DHG-14 and SHREC-17

  1. After downloading datasets, set the path to your downloaded dataset folder in the /util/DHG_parse_data.py (line 2) or the /util/SHREC_parse_data.py (line 5)
  2. Get class subsets by sampling:
python get_sampling_DHG.py --num_subclasses 3 --subclass_type clustering --compressor_path path/to/compressor

or

python get_sampling.py --num_subclasses 3 --subclass_type clustering --compressor_path path/to/compressor
  1. Train router learner:
python train_decision_maker_DHG.py --num_subclasses 3 --finetune --sampling_result "CLASS_SUBSETS" --epochs 100

or

python train_decision_maker.py --num_subclasses 3 --finetune --sampling_result "CLASS_SUBSETS" --epochs 100

The variable CLASS_SUBSETS should be replaced by the actual class subsets.

  1. Train expert learner:
python train_subclassifier_DHG.py --num_subclasses 3 --finetune --sampling_result "CLASS_SUBSETS" --epochs 100

or

python train_subclassifier.py --num_subclasses 3 --finetune --sampling_result "CLASS_SUBSETS" --epochs 100

For DHG-14 and SHREC'17, all expert learners can be trained by one command given the CLASS_SUBSETS you have.

  1. Run single/multiple expert routing modules:
python fuse_DHG.py --num_subclasses 3 --sampling_result "CLASS_SUBSETS" --config "./config/home.json" --compressor_path path/to/compressor

or

python fuse.py --num_subclasses 3 --sampling_result "CLASS_SUBSETS" --config "./config/home.json" --compressor_path path/to/compressor

compressor_path is needed for evaluation and comparison between the accuracy and robustness improvements of HomE and the base model.

Note: for pretrained base models, you can either train or get them from TransformerBasedGestureRecognition and DG‑STA

Acknowledgements

This work was supported in part by the University of Illinois Urbana-Champaign.
We thank the authors of TransformerBasedGestureRecognition and DG-STA for providing their codebases.

References

This implementation use prior open‑source gesture‑recognition repositories as backbone

Cite as ▶
@inproceedings{d2020transformer,
  title     = {A Transformer-Based Network for Dynamic Hand Gesture Recognition},
  author    = {D'Eusanio, Andrea and Simoni, Alessandro and Pini, Stefano and Borghi, Guido and Vezzani, Roberto and Cucchiara, Rita},
  booktitle = {Proc. 3DV},
  year      = {2020}
}
Cite as ▶
@inproceedings{chenBMVC19dynamic,
  author    = {Chen, Yuxiao and Zhao, Long and Peng, Xi and Yuan, Jianbo and Metaxas, Dimitris N.},
  title     = {Construct Dynamic Graphs for Hand Gesture Recognition via Spatial-Temporal Attention},
  booktitle = {BMVC},
  year      = {2019}
}

Citation

If you use this code or find our work helpful, please cite:

@inproceedings{fg2025home,
  author    = {Haochen Xia and Aamir Hasan and Haonan Chen and Katherine Driggs-Campbell},
  title     = {HomE: A Homogeneous Ensemble Framework for Dynamic Hand Gesture Recognition},
  booktitle = {Proceedings of the IEEE International Conference on Automatic Face and Gesture Recognition (FG)},
  year      = {2025},
  organization = {IEEE}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages