This repo turns document images (like receipts or invoices) into clean, structured JSON and optional visuals for your frontend. It combines PaddleX for text extraction, Table Transformer for layout detection, and LLaMA2 for smart header matching.
- Upload & Prep
- Drop in images or PDFs.
- Auto-convert to JPG if needed.
- Quick format and error checks.
- Table Processing
- OCR: PaddleOCR grabs text and bounding boxes.
- Structure Detection: Table Transformer finds rows, columns and cells.
- Smart Parsing
- LLaMA2 links headers to your target fields (think “fee” vs. “price”).
Output: JSON file with only the fields you need (currently service_date, item_code, unit_price, quantity, and gst.
cd Invoice_service/.
├── bin
├── image_analysis
│ ├── config
│ ├── models # (place the 2 required models)
│ ├── schemas
│ ├── servces
│ └── views
│ └── html
└── table_detection
├── config
├── detr
│ ├── d2
│ │ ├── configs
│ │ └── detr
│ ├── datasets
│ ├── models
│ └── util
└── src1. Download new_model_with_header.pt from Google Drive: https://drive.google.com/file/d/1MBOiAizY6_4m8py8ziaS5k0Q67Dw3fg4/view?usp=sharing
2. wget https://huggingface.co/bsmock/tatr-pubtables1m-v1.0/resolve/main/pubtables1m_detection_detr_r18.pthUnder Invoice_service/
sudo docker compose up -d --buildNote: You can access the UI service on port 8080 by SSH local port forwarding.
- input processing
- handle bad example
- handle different input file type
- training set, testing set, validation set partition
- model training
- training scripts and configurations
- training logs
- other
- ollama prompt tests
- PaddleX, PaddleOCR configuration tests
- OCR token formatter
- output csv similarity tests (CER, WER, Cos Similarity)
- column header handler
You can refer to main.ipynb to explore some intermediate steps.