SmartOrderReader is a smart Python tool that automatically processes images of invoices, order confirmations, or receipts. It uses OCR (Optical Character Recognition) to extract the Order Number and Order/Creation Date from each image with high accuracy. The program then organizes the extracted data into a clean table and saves it as a CSV file for easy management and use in Excel or Google Sheets.
- ✅ Extract Order Number/Invoice Number/Order ID from images
- ✅ Extract Order Date/Invoice Date/Creation Date from images
- ✅ Support for multiple image formats (JPG, PNG, BMP, TIFF)
- ✅ Process multiple images in a single run
- ✅ Support for both English and Urdu text
- ✅ Clean table output with professional formatting
- ✅ CSV export for easy Excel/Google Sheets integration
- ✅ Automatic image preprocessing for better OCR accuracy
- ✅ Handles various date formats (MM/DD/YYYY, YYYY-MM-DD, Month DD, YYYY)
- ✅ Smart pattern matching for different invoice formats
- Python 3.7 or higher
- Tesseract OCR engine
Ubuntu/Debian:
sudo apt-get update
sudo apt-get install tesseract-ocr
# For Urdu support (optional):
sudo apt-get install tesseract-ocr-urdmacOS:
brew install tesseract
# For Urdu support (optional):
brew install tesseract-langWindows:
- Download and install from: https://github.com/UB-Mannheim/tesseract/wiki
- Add Tesseract to your system PATH
- Clone the repository:
git clone https://github.com/Next-GenDeveloper/SmartOrderReader.git
cd SmartOrderReader- Install Python dependencies:
pip install -r requirements.txtProcess one or more images:
python order_reader.py invoice1.jpg invoice2.png receipt.jpgSpecify custom output file:
python order_reader.py *.jpg --output my_orders.csvProcess images with Urdu text support:
python order_reader.py invoice.png --lang eng+urdProcess all images in current directory:
python order_reader.py *.jpg *.pngusage: order_reader.py [-h] [-o OUTPUT] [-l LANG] images [images ...]
SmartOrderReader - Extract order data from invoice/order images
positional arguments:
images Image files to process (supports jpg, png, jpeg, bmp, tiff)
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output CSV filename (default: order_data.csv)
-l LANG, --lang LANG OCR language (default: eng, use eng+urd for English+Urdu)
The tool provides two types of output:
================================================================================
EXTRACTED ORDER DATA
================================================================================
+-------------------+----------------+------------------+
| Image File Name | Order Number | Order Date |
+===================+================+==================+
| invoice1.jpg | ABC12345 | 12/25/2023 |
+-------------------+----------------+------------------+
| invoice2.png | INV-2024-001 | January 15, 2024 |
+-------------------+----------------+------------------+
Image File Name,Order Number,Order Date
invoice1.jpg,ABC12345,12/25/2023
invoice2.png,INV-2024-001,January 15, 2024
The CSV file is automatically saved and can be directly imported into Excel or Google Sheets.
- Order #, Order No., Order Number
- Invoice #, Invoice No., Invoice Number
- Order ID
- PO # (Purchase Order)
- Reference #, Ref #
- Order Date, Invoice Date
- Date, Issue Date
- Created On, Transaction Date
- Creation Date
- MM/DD/YYYY or DD/MM/YYYY (e.g., 12/25/2023)
- YYYY-MM-DD (e.g., 2023-12-25)
- Month DD, YYYY (e.g., December 25, 2023)
- Image Preprocessing: Images are converted to grayscale and optimized for better OCR accuracy
- OCR Text Extraction: Tesseract OCR extracts all text from the image
- Pattern Matching: Smart regex patterns identify order numbers and dates
- Data Extraction: Most relevant information is extracted based on priority
- Output Generation: Results are formatted as table and CSV
- Make sure Tesseract OCR is installed and in your system PATH
- On Windows, you may need to set the path manually in the script
- Ensure the image is clear and readable
- Try increasing image resolution
- Check if the text is in a supported language
- Make sure the image quality is good
- Verify that order number/date labels are clearly visible
- Try different image preprocessing settings
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
Created by Next-GenDeveloper
- Tesseract OCR for text recognition
- OpenCV for image preprocessing
- Python community for excellent libraries