Open-source, real-time hand gesture control for all major operating systems using MediaPipe and OpenCV. The system reads frames from a webcam, classifies hand gestures using a MediaPipe GestureRecognizer model running in live-stream mode, and routes recognized gestures to modular action handlers. Up to two hands are tracked simultaneously, with priority given to two-hand gestures over single-hand ones.
|
Single Hand
|
Double Hand
|
|
Motion Gesture
|
A webcam feed is captured frame by frame using OpenCV. Each frame is converted from BGR to RGB and passed asynchronously to MediaPipe's GestureRecognizer, which runs on an internal thread. Results are stored in a thread-safe slot and consumed on the main loop. The latest available result is passed to a GestureRouter, which checks every registered gesture module in priority order. If a module's matches() function returns true and its cooldown has elapsed, the module's action() is executed. An overlay is drawn on the current frame showing the detected gesture label and the last fired action, then displayed in an OpenCV window.
The system uses LIVE_STREAM mode from the MediaPipe Tasks API, which means recognize_async() returns immediately and results arrive via a callback on a separate thread. A threading.Lock is used to safely pass results from the callback thread to the main loop.
open-gestures/
├── main.py Entry point. Runs the camera loop, recognizer, and router.
├── models/
│ └── gesture_recognizer.task MediaPipe model file.
├── core/
│ └── cooldown.py Cooldown manager preventing gesture actions from firing too rapidly.
│ └── router.py Router for linking actions with gestures
└── gestures/
├── static/ Gesture modules that match on a single static hand pose.
└── motion/ Gesture modules that match on motion-based or temporal gestures. (Work In Progress)
Each file under gestures/static/ and gestures/motion/ is a self-contained Python module. Modules ending in _2 represent two-hand gestures and are loaded with higher priority than single-hand (_1) modules.
Every gesture module must expose two things:
GESTURE_NAME: str # Unique identifier used as the cooldown key and display label.
def matches(result) -> bool:
# Receives a MediaPipe GestureRecognizerResult.
# Return True if this gesture should fire given the current result.Modules that do not expose all two attributes are skipped at load time with a warning.
Every action module must expose four things:
ACTION_NAME: str # Unique identifier used as the cooldown key and display label.
ACTION_DESCRIPTION: str # Unique description shown in the GUI.
ACTION_ID: str # Unique ID used by the router to connect gestures with the action.
def execute(self) -> None:
# Action logic here.Modules that do not expose all four attributes are skipped at load time with a warning.
Operating System: Linux / Windows / macOS
Python: 3.9 or later is recommended for compatibility with the MediaPipe Tasks API.
Hardware: An accessible webcam.
Dependencies:
| Package | Purpose |
|---|---|
mediapipe |
Hand landmark detection and gesture classification via the Tasks API |
opencv-python |
Webcam capture, frame processing, and display window |
numpy |
Implicit dependency of both mediapipe and opencv |
1. Clone the repository
git clone https://github.com/GodKode69/open-gestures.git
cd open-gestures2. Create and activate a virtual environment (recommended)
On Linux / macOS:
python3 -m venv env
source env/bin/activateOn Windows:
python -m venv env
env\Scripts\activate3. Install Python dependencies
pip install mediapipe opencv-python numpy4. Download the MediaPipe model
The gesture recognizer model must be downloaded separately and placed at models/gesture_recognizer.task. You can obtain it from Google's MediaPipe model page.
python main.pyThe application will open a window titled Open-Gestures showing the gestures with their bound actions, with the option to switch actions and also to reset them to defaults. A preview option is also available to preview your webcam with gesture and action detection.
The Cooldown class in core/cooldown.py prevents a gesture action from firing multiple times in rapid succession. Each gesture module is gated by its GESTURE_NAME key. When a gesture fires, its timestamp is recorded and subsequent triggers are blocked until the cooldown period elapses. This prevents unintended repeated actions from a held or slowly-changing pose.
The recognizer is configured with the following defaults in main.py:
| Parameter | Value |
|---|---|
num_hands |
2 |
min_hand_detection_confidence |
0.5 |
min_hand_presence_confidence |
0.5 |
min_tracking_confidence |
0.5 |
| Camera resolution | 640 × 480 |
| Camera target FPS | 30 |
These values can be adjusted directly in main.py inside the build_recognizer function and the cap.set(...) calls in main().
Python — Primary implementation language.
MediaPipe (Tasks API) — Google's framework for on-device ML pipelines. Specifically, mediapipe.tasks.python.vision.GestureRecognizer is used in LIVE_STREAM mode with an asynchronous result callback. The underlying model classifies hand gestures from landmark data.
MediaPipe GestureRecognizer model (gesture_recognizer.task) — A float16 TFLite model that performs hand detection, landmark estimation, and gesture classification in a single pipeline. Downloaded separately from Google's model storage.
OpenCV (cv2) — Used for webcam capture via VideoCapture, frame color conversion, mirror flipping, text and rectangle overlays, and the display window.
Python threading module — Used to safely transfer GestureRecognizerResult objects from MediaPipe's internal callback thread to the main loop via a threading.Lock.
Python importlib and pkgutil — Used for dynamic discovery and loading of gesture modules at runtime without any hard-coded imports.
Contributions are welcome! If you'd like to add new gesture modules, action handlers, or platform support, feel free to open a pull request on the GitHub repository.
All rights are reserved by the author and the contributors.