Source code, data and results: GoogleDrive
The objective of this project was to develop a computer vision system capable of analyzing the state of a Monopoly game from video footage. As defined in Milestone 1, the system focuses on tracking specific game elements and detecting key gameplay events.
Key Game Elements: The following physical objects were identified as targets for detection:
- Pawns: The player markers moving around the board.
- Dice: The source of randomness determining movement.
- Cards: Chance/Community Chest cards or Property deeds appearing on the board.
Key Gameplay Events: The system aimed to recognize the following logic events based on visual data:
- Dice Roll: Detect when dice are thrown and come to rest.
- Pawn Movement: Tracking the displacement of a player's piece.
- Card Interaction: Detecting the drawing or placing of a card.
- Property Development
- Cash Exchange
The system was tested on a custom dataset consisting of video recordings of the Monopoly board. The data was categorized into three difficulty levels: Easy, Medium, and Difficult.
- Characteristics: All datasets, including the "Easy" set, presented significant challenges from the outset. The camera angle introduced a curved perspective (lens distortion and oblique viewing angle), and variable lighting conditions were present throughout the recordings.
Easy Set: 3 x ~1min .MOV files of gameplay with minimal external interference.
Medium Set: 3 x ~1min .MOV files of faster gameplay with more occlusions.
Difficult Set: 3 x ~1min .MOV files containing extreme lighting variations and rapid movement.
3. Methodology and Techniques
To process the video frames, we utilized the OpenCV library. The pipeline involved several preprocessing steps shared across different detection modules, including downsizing (for performance), morphological operations (dilation/erosion to reduce noise), color masking, and Canny edge detection.
The specific techniques applied to each element were as follows:
- Technique: SIFT (Scale-Invariant Feature Transform).
- Process: We extracted SIFT keypoints from a reference image of the board and matched them with the current video frame. Using the matching points, we calculated a Homography matrix to warp the perspective.
- Intermediate Result: A "top-down" (bird's-eye view) representation of the board, isolated from the background.
- Technique: SIFT combined with Area Filtering.
- Process: SIFT descriptors were used to identify the textures of the dice. We applied area filtering to discard false positives that did not match the expected size of a die.
- Intermediate Result: Bounding boxes drawn around the dice on the warped board.
- Technique: Motion Detection (Background Subtraction).
- Process: We utilized moving object detection algorithms (akin to
cv2.createBackgroundSubtractorMOG2or frame differencing). This isolates pixels that change between frames. - Refinement: Results were filtered by contour area to distinguish pawns from hand movements or lighting flicker.
- Intermediate Result: Binary masks highlighting moving regions, subsequently converted to tracking centroids.
- Technique: Template Matching (
cv2.matchTemplate). - Process: The system compares a sliding window of the input frame against reference images of the cards.
- Refinement: Preprocessing via edge detection helped emphasize the high-contrast graphics of the cards.
- Intermediate Result: Heatmaps indicating high-correlation regions corresponding to specific cards, then drawing rectangular boxes around them.
| Dataset | Board Detection | Card Detection | Pawn Detection | Dice Detection |
|---|---|---|---|---|
| Easy | Excellent. High stability. | Excellent. High accuracy. | Low/Moderate. Frequent loss of tracking. | Low/Moderate. False positives and negatives common. |
| Medium | Excellent. High stability. | Excellent. High accuracy. | Low/Moderate. Frequent loss of tracking. | Low/Moderate. False positives and negatives common. |
| Difficult | Good. | Good. | Poor. Lighting triggers false motion. | Poor. High failure rate. |
The results of the project present a mixed success rate, largely dependent on the distinct characteristics of the objects being tracked.
Successes:
- Board and Card Detection: The use of SIFT for board localization and Template Matching for cards proved to be highly effective. These methods are robust against the static nature of the graphics. The board detection successfully corrected the challenging curved perspective, providing a solid foundation for the rest of the logic.
Challenges and Failures:
- Pawns and Dice: The detection of dynamic 3D objects (pawns and dice) was relatively unsatisfactory.
- Motion Detection (Pawns): Relying on motion detection for pawns proved brittle. The variable lighting conditions often triggered false motion alerts, while the pawns frequently "disappeared" when they stopped moving.
- SIFT (Dice): While SIFT is powerful for rich textures, dice faces often lack enough unique features when blurred by motion or viewed from steep angles, leading to lost tracking.
- Environmental Factors: The "Easy" dataset was deceptively difficult due to the perspective distortion and lighting shifts. While our preprocessing (morphology, color masking) mitigated some issues, classical computer vision techniques struggled to maintain state consistency compared to deep learning alternatives (like YOLO).
Conclusion: While the system successfully understands the game environment (board and cards), the tracking of player actions (pawns and dice) requires more robust feature extraction or the implementation of neural networks to handle the variability in lighting and perspective.
- OpenCV Documentation. "Template Matching," "Background Subtraction," and "Feature Matching." Available online.









