A lightweight Optical Character Recognition (OCR) engine built from the ground up using only NumPy. This project was created to dive into the fundamentals of Artificial Intelligence without relying on heavy frameworks like PyTorch or TensorFlow.
This implementation focuses on the classic MNIST dataset (handwritten digits). It uses a custom-built neural network architecture with a Softmax activation layer for multi-class classification.
Key features:
- Zero deep learning frameworks: Pure mathematical implementation using NumPy.
- Softmax integration: For robust probability distribution over the 10 digit classes.
- Fast environment: Managed with
uvfor ultra-fast dependency resolution.
This project uses uv for Python package management.
# Clone the repository
git clone https://github.com/kylianmthr/OCR/
cd OCR
# Install dependencies and setup environment
make installYou can train the model using the raw MNIST binary files:
python main.py --train <path_to_train_images> <path_to_train_labels>Alternatively, if configured in your Makefile:
make trainTo run the OCR on a specific image, ensure your input is a 28x28 PNG file.
make exec
or
python main.py --exec <path_to_weights.npy> <path_to_image.png>Note
This project was developed for educational purposes to understand the "math behind the magic." While functional, the accuracy is not meant to compete with SOTA (State-Of-The-Art) convolutional models, but rather to demonstrate the feasibility of a "from scratch" approach.
- Add Convolutional layers (CNN) also from scratch.
- Implement data augmentation to improve accuracy.
- Add a visualization tool for the weight matrices.