This repository contains a hand gesture recognition system that classifies hand gestures (numbers 0–5) using a convolutional neural network (CNN) implemented in PyTorch and real-time webcam input processed with OpenCV. The project includes scripts for data capture, model training, and real-time inference, along with a pre-trained model.
- Captures hand gesture images using a webcam (
capture-images.py). - Trains a CNN model on a custom dataset (
train_model.ipynb). - Performs real-time gesture recognition with webcam input (
pytorch-opencv.py). - Uses a three-way dataset split (train/validation/test) for robust evaluation.
- Includes data augmentation and early stopping for improved training.
- Pre-trained model (
model.pth) for immediate inference.
train_model.ipynb: Jupyter Notebook to train the CNN model on a custom dataset.capture-images.py: Script to capture hand gesture images via webcam.pytorch-opencv.py: Script for real-time gesture recognition using the trained model.model.pth: Pre-trained model weights for the CNN (6 classes, 32x32 grayscale input).
Note: The captured_dataset folder (containing training/test images) is not included. You must create this folder and populate it with your own images (see Data Collection).
- Python: 3.8 or higher
- Libraries:
torchtorchvisionopencv-pythonnumpypillow
- Hardware:
- Webcam for data capture and real-time inference.
- Optional: GPU for faster training (CPU works but is slower).
Install dependencies:
pip install torch torchvision opencv-python numpy pillow-
Clone the Repository:
git clone https://github.com/your-username/FingerDetectionOpenCV.git cd FingerDetectionOpenCV -
Create Dataset Folder:
- Create a folder named
captured_datasetwith subfolders0,1,2,3,4,5(one for each gesture class):mkdir -p captured_dataset/{0,1,2,3,4,5}
- Create a folder named
-
Capture Images:
- Run
capture-images.pyto collect hand gesture images using your webcam:python capture-images.py
- Instructions:
- Place your hand in the green ROI box displayed on the webcam feed.
- Press keys
0–5to save images to the corresponding class folder (e.g.,captured_dataset/0for gesture 0). - Press
pto preview the last captured image (32x32 grayscale). - Press
qto quit.
- Aim for 100–500 images per class for balanced training data.
- Ensure varied lighting, backgrounds, and hand positions for robustness.
- Run
-
Verify Dataset:
- Check that
captured_datasetcontains subfolders0to5, each with images (32x32 grayscale PNGs). - Example command to count images per class:
for dir in captured_dataset/[0-5]; do echo "$dir: $(ls $dir | wc -l) images"; done
- Check that
-
Run the Training Script:
- Open
train_model.ipynbin a Jupyter Notebook environment (e.g., VS Code, JupyterLab, or Google Colab). - Execute all cells to:
- Split
captured_datasetintotrain(64%),validation(16%), andtest(20%) sets. - Train a
ConvNeuralNetworkmodel with data augmentation and early stopping. - Save the best model to
model.pth.
- Split
- Alternatively, convert the notebook to a
.pyfile and run:python train_model.py
- Open
-
Training Details:
- Model: Convolutional Neural Network (CNN) with two convolutional layers and two fully connected layers.
- Input: 32x32 grayscale images.
- Classes: 6 (gestures 0–5).
- Optimizer: Adam (learning rate 0.001).
- Loss: Cross-entropy.
- Evaluation: Validation set for early stopping, test set for final accuracy.
- Expected test accuracy: >80% with sufficient, diverse data.
-
Output:
- The script prints training loss, validation accuracy, and final test accuracy.
- The trained model is saved as
model.pth.
-
Run the Inference Script:
- Use
pytorch-opencv.pyto perform real-time gesture recognition with your webcam:python pytorch-opencv.py
- Instructions:
- Place your hand in the green ROI box.
- The script displays the predicted gesture (0–5) and confidence score.
- Green text indicates high confidence (>0.7); orange indicates lower confidence.
- Press
qto quit.
- Use
-
Using the Pre-Trained Model:
- The included
model.pthcan be used for inference without retraining. - Ensure
pytorch-opencv.pyis in the same directory asmodel.pth.
- The included
- Dataset Quality: For best results, collect diverse images (different lighting, backgrounds, hand positions). Ensure each class has a similar number of images.
- Overfitting: If test accuracy is much lower than validation accuracy, add more data or increase augmentation (e.g., random flips).
- Performance: Training on CPU is slow; use a GPU if available.
- Debugging: If accuracy is low, check dataset balance or inspect misclassifications using a confusion matrix (see
train_model.ipynbcomments for code).
Contributions are welcome! Please:
- Fork the repository.
- Create a feature branch (
git checkout -b feature/YourFeature). - Commit changes (
git commit -m "Add YourFeature"). - Push to the branch (
git push origin feature/YourFeature). - Open a pull request.
This project is licensed under the MIT License. See the LICENSE file for details.