A Python application that uses classical computer vision and machine learning to recognize and solve handwritten mathematical equations.
- Preprocesses images of handwritten equations using advanced computer vision techniques
- Implements HOG (Histogram of Oriented Gradients) and pixel intensity features
- Uses K-Nearest Neighbors (KNN) and Support Vector Machine (SVM) with RBF kernel for classification
- Segments and recognizes individual digits and operators with high accuracy
- Solves linear equations and arithmetic expressions
- Visualizes the recognition process
-
Clone this repository:
git clone <repository-url> cd handwritten-equation-solver
-
Install the required dependencies:
pip install -r requirements.txt
- Place your handwritten equation image in the project directory.
- Update the
image_pathinequation_solver.pyto point to your image. - Run the solver:
python equation_solver.py
For an image named equation.png containing a handwritten equation like "2x + 5 = 15", the output will be:
Solution: x = 5
- Input Image Handling: Accepts grayscale or color images of handwritten equations
- Binarization: Converts the image to binary using Otsu's thresholding
- Character Segmentation:
- Applies contour detection to isolate individual characters
- Performs size normalization while maintaining aspect ratio
- Centers characters in a 32x32 pixel canvas for consistent processing
-
HOG Features:
- Computes gradient magnitude and orientation using Sobel operators
- Divides the image into 8x8 pixel cells
- Calculates 9-bin histograms of gradient orientations per cell
- Normalizes histograms in 2x2 cell blocks using L2-Hys normalization
- Results in a 324-dimensional feature vector (4 blocks × 4 cells × 9 bins)
-
Pixel Intensity Features:
- Flattens the 32x32 normalized image into a 1024-dimensional vector
- Normalizes pixel values to [0, 1] range
The system employs two complementary models:
-
K-Nearest Neighbors (KNN):
- Non-parametric method for classification
- Uses k=3 neighbors with uniform weights
- Effective for capturing local patterns in handwritten digits
-
Support Vector Machine with RBF Kernel (SVM-RBF):
- Implements a non-linear decision boundary
- Uses Radial Basis Function kernel for better separability
- Optimized hyperparameters for character recognition
-
Character Recognition:
- Combines HOG and pixel features into a single feature vector
- Uses the trained models to predict individual characters
- Applies post-processing to improve recognition accuracy
-
Equation Solving:
- Parses the sequence of recognized characters
- Validates mathematical expressions
- Solves linear equations and arithmetic expressions
- Python 3.7+
- OpenCV
- NumPy
- scikit-image
- TensorFlow
- SymPy
- Matplotlib
- The system achieves optimal results with clear, well-separated characters
- Works best with equations written on a clean, high-contrast background
- Performance may vary with different writing styles and equation complexity
- For improved accuracy, consider training on a larger dataset of handwritten equations
-
Input Normalization:
def resize_keep_aspect(img, target=32): # Resizes image while maintaining aspect ratio # Pads with zeros to reach target dimensions
-
HOG Descriptor:
def hog_descriptor(img32: np.ndarray) -> np.ndarray: # Implements HOG feature extraction # Returns a 324-dimensional feature vector
-
Intensity Features:
def intensity_features(img32: np.ndarray) -> np.ndarray: # Extracts normalized pixel intensities # Returns a 1024-dimensional feature vector
- The models are trained on a dataset of handwritten digits and operators
- Feature vectors are normalized before training
- Cross-validation is used to optimize model parameters
- Expand character set to include more mathematical symbols
- Implement equation structure analysis for better parsing
- Add support for multi-line equations
- Incorporate deep learning models for improved recognition accuracy
A deep learning project for computer vision tasks using Convolutional Neural Networks (CNN).
This project implements various CNN architectures for image classification, object detection, or other computer vision tasks. It's designed to be modular and easy to extend for different use cases.
- Support for multiple CNN architectures
- Data preprocessing and augmentation
- Model training and evaluation
- Pretrained model support (e.g., ResNet, VGG, etc.)
- Visualization tools for model performance
-
Clone the repository:
git clone https://github.com/yourusername/cnn-cv-project.git cd cnn-cv-project -
Create and activate a virtual environment (recommended):
python -m venv venv .\venv\Scripts\activate # On Windows source venv/bin/activate # On Linux/Mac
-
Install the required packages:
pip install -r requirements.txt
- Prepare your dataset in the appropriate directory structure
- Configure the model parameters in
config.py(if available) - Run the training script:
python train.py
- Evaluate the model:
python evaluate.py
The model is built using PyTorch and features a custom CNN architecture with residual connections for improved training stability and performance.
-
Input Layer
- Input shape: 1x64x64 (grayscale images)
- Normalization: Pixel values scaled to [0, 1]
-
Convolutional Blocks
-
Block 1:
- Conv2D: 32 filters (3x3), padding=1
- Batch Normalization
- ReLU activation
- Dropout (0.2)
- MaxPool2D (2x2)
-
Block 2:
- Conv2D: 64 filters (3x3), padding=1
- Batch Normalization
- ReLU activation
- Dropout (0.3)
- MaxPool2D (2x2)
-
Block 3:
- Conv2D: 128 filters (3x3), padding=1
- Batch Normalization
- ReLU activation
- Dropout (0.4)
- MaxPool2D (2x2)
-
-
Residual Blocks
- Two residual blocks with skip connections
- Each block contains:
- Conv2D (3x3) → BatchNorm → ReLU → Conv2D (3x3) → BatchNorm
- Skip connection adds input to output
- Final ReLU activation
-
Classification Head
- Global Average Pooling
- Fully Connected Layer: 128 → 256 units
- ReLU activation
- Dropout (0.5)
- Output Layer: 256 → num_classes
- Residual Connections: Help mitigate vanishing gradient problem
- Batch Normalization: Improves training stability
- Progressive Dropout: Increasing dropout rates in deeper layers
- Global Average Pooling: Reduces parameters before final classification
cnn-cv-project/
├── data/ # Dataset directory
├── models/ # Model definitions
├── utils/ # Utility scripts
├── config.py # Configuration file
├── train.py # Training script
├── evaluate.py # Evaluation script
├── requirements.txt # Dependencies
└── README.md # This file
- Python 3.8+
- PyTorch or TensorFlow
- Other dependencies listed in
requirements.txt