AI Agent for OCR is a full‑stack Django application that extracts clean, editable text from images and PDFs with multi‑language support. It blends classic OCR (Tesseract) with modern LLM tooling to improve text quality, add metadata, and streamline export and review workflows. An admin panel provides visibility into OCR requests, feedback analytics, and system health.
Use cases include digitizing documents, processing forms, multilingual text extraction, and building searchable archives.
- Multi‑input support: images (PNG/JPG) and PDFs
- Multi‑language OCR: English, Hindi, Gujarati, Sanskrit (extensible)
- Smart text cleanup and formatting (LLM‑assisted pipeline ready)
- Export: TXT, DOCX, and PDF options
- Admin Panel: Dashboard, OCR Logs, User Feedback, Settings
- Feedback analytics: average accuracy and experience scores
- User session tracking and audit logs
- Pluggable OCR backends (Tesseract by default; API hooks for cloud OCR)
- Rate limiting, basic logging, and history retention

Users can upload images or PDFs and choose OCR language.

Extracted text preview with copy/download options.

Converts uploaded images into editable, searchable PDF files.

Extracted text preview with copy/download options.
- Backend: Python, Django 4.x
- Database: MySQL
- OCR: Tesseract OCR (
pytesseract) with language packs (eng,hin,guj,san) - Frontend: HTML, CSS, JavaScript (no SPA framework required)
- Optional AI/LLM: OpenAI API / LangChain (placeholders; integrate as needed)
- Utilities: Pillow, PyPDF2/pdfplumber, python‑docx, reportlab
- Input → Validation → OCR (Tesseract) → Post‑processing (cleanup/LLM) → Output
- Django views handle uploads and orchestration
- MySQL stores users, OCR logs, and feedback
- Admin Panel surfaces metrics and history
Client (Web UI)
↓ upload
Django View
↓ route
OCR Service (Tesseract)
↓ text
Post‑processor (LLM / cleanup)
↓ formatted text
Storage (DB + Files) → Admin Panel / Downloads
- Add GPU‑accelerated OCR and table extraction
- Integrate cloud OCR (Google Vision, Azure, AWS Textract)
- Add LLM‑based post‑correction and summarization
- Role‑based access control and audit trails
- Advanced analytics and export to CSV/Parquet
This project is licensed under the MIT License. See the LICENSE file for details.
- GitHub:
https://github.com/Buildwith.18 - Email: buildwith.18@gmail.com
—
If this project helps you, consider ⭐ starring the repository!
