Sudoku Solver is an end-to-end computer vision + machine learning system that extracts, validates, and solves Sudoku puzzles from real-world images captured using a mobile camera.
The project focuses on robust digit extraction under noisy visual conditions, ML inference integration, and system-level tradeoffs between on-device and server-side processing.
Camera-captured Sudoku images suffer from perspective distortion, glare, uneven lighting, and OCR noise. Fully automated pipelines frequently fail due to misclassified or hallucinated digits.
This project addresses the problem using a hybrid ML + algorithmic approach:
- Computer vision and CNN-based digit recognition for extraction
- Human-in-the-loop correction for noisy predictions
- Deterministic backtracking solver for correctness guarantees
- Image Preprocessing: Grayscale conversion, adaptive thresholding, contour detection (OpenCV)
- Perspective Transformation: Grid isolation and normalization
- Cell Extraction: 81 individual cell crops
- Digit Recognition: CNN trained on digit dataset, exported as TensorFlow Lite
- Post-processing: Empty-cell filtering and grid validation
The digit recognition model is optimized for fast inference and deployed as a TFLite artifact for mobile-friendly execution.
- Server-side CV Processing: OpenCV-based preprocessing and digit extraction hosted on a Flask API (AWS EC2) to handle computationally heavy image operations.
- On-device Solving: Sudoku solving implemented locally using an optimized backtracking algorithm to avoid unnecessary network calls and ensure low-latency responses.
- Human-in-the-loop Correction: Due to real-world OCR noise (glare, screen capture artifacts), the grid is made editable to allow user correction before solving.
This mirrors real-world ML systems where model predictions are probabilistic and require validation or correction layers.
- False positives caused by glare and screen refresh artifacts
- Digit hallucination in empty cells
- Invalid grids rejected using Sudoku rule validation
Invalid predictions are surfaced to the user rather than silently solved, prioritizing correctness over blind automation.
- Mobile: Android (Java)
- Backend: Python, Flask
- Computer Vision: OpenCV
- Machine Learning: TensorFlow, TensorFlow Lite
- Cloud: AWS EC2
app/— Android application (camera capture, UI, solver integration)server/— Flask API for CV preprocessing and digit inferencemodel/— CNN training scripts and exported TFLite model
A full end-to-end demo of image capture, digit extraction, correction, and solving: