OCR Document Extraction & Verification Application

A complete web application for extracting structured information from document images using OCR and verifying user-entered data against extracted values.

Architecture

Frontend: HTML, CSS, JavaScript (vanilla)
Backend: Spring Boot (Java) REST API
OCR Engine: Python + Microsoft TrOCR (PyTorch)
Communication: Spring Boot executes Python scripts locally via ProcessBuilder

Project Structure

OCR_PROJECT/
├── backend/                          # Spring Boot application
│   ├── src/main/java/com/ocr/
│   │   ├── OcrApplication.java       # Main Spring Boot app
│   │   ├── config/
│   │   │   └── PythonConfig.java     # Python execution config
│   │   ├── controller/
│   │   │   └── OcrController.java    # REST endpoints
│   │   ├── service/
│   │   │   ├── OcrService.java       # OCR extraction service
│   │   │   └── VerificationService.java  # Data verification service
│   │   └── dto/                      # Data Transfer Objects
│   ├── src/main/resources/
│   │   └── application.properties    # Application configuration
│   └── pom.xml                       # Maven dependencies
│
├── python-ocr/                       # Python OCR module
│   ├── ocr_processor.py             # TrOCR model wrapper
│   ├── field_extractor.py            # Regex-based field extraction
│   └── requirements.txt              # Python dependencies
│
└── frontend/                         # Frontend web application
    ├── index.html                    # Main HTML page
    ├── style.css                     # Styling
    └── app.js                        # JavaScript logic

Prerequisites

Required Software

Java Development Kit (JDK) 17 or higher
- Download from: https://adoptium.net/
- Verify: java -version
Maven 3.6+
- Download from: https://maven.apache.org/download.cgi
- Verify: mvn -version
Python 3.8+
- Download from: https://www.python.org/downloads/
- Verify: python --version or python3 --version
pip (Python package manager)
- Usually comes with Python
- Verify: pip --version

System Requirements

RAM: Minimum 4GB (8GB+ recommended for TrOCR model)
Storage: ~2GB free space for Python dependencies and models
OS: Windows, Linux, or macOS

Installation & Setup

Step 1: Install Python Dependencies

Navigate to the python-ocr directory:
```
cd python-ocr
```
Create a virtual environment (recommended):
```
python -m venv venv
```
Activate virtual environment:
- Windows:
```
venv\Scripts\activate
```
- Linux/macOS:
```
source venv/bin/activate
```
Install Python dependencies:
```
pip install -r requirements.txt
```
Note: This will download PyTorch and TrOCR model (~1.5GB). The first run will also download the TrOCR model weights.

Test Python OCR script:

python ocr_processor.py <path_to_test_image>

Step 2: Configure Spring Boot Application

Navigate to the backend directory:
```
cd backend
```
Edit src/main/resources/application.properties:
- Update python.executable if your Python command is python3 instead of python
- Verify python.ocr.script path is correct relative to project root
Build the Spring Boot application:
```
mvn clean install
```
This will download all Maven dependencies and compile the project.

Step 3: Setup Frontend

The frontend files are static HTML/CSS/JS files. No build process required.

Important: Update the API URL in frontend/app.js if your backend runs on a different port:

const API_BASE_URL = 'http://localhost:8080/api';

Running the Application

Step 1: Start Spring Boot Backend

Navigate to backend directory:
```
cd backend
```

Run Spring Boot application:

mvn spring-boot:run

Or run the JAR file:

java -jar target/ocr-application-1.0.0.jar

Verify backend is running:
- Open browser: http://localhost:8080
- You should see a Spring Boot error page (expected, as there's no root endpoint)
- Check logs for: "Started OcrApplication"

Step 2: Serve Frontend

You can serve the frontend in several ways:

Option A: Using Python HTTP Server (simplest)

cd frontend
python -m http.server 8000

Option B: Using Node.js http-server

npx http-server frontend -p 8000

Option C: Using any web server

Copy frontend/ contents to your web server directory
Ensure CORS is enabled (backend already allows all origins)

Step 3: Access Application

Open browser: http://localhost:8000
Upload a document image (JPG, PNG, or PDF)
Click "Extract Information"
Review and edit extracted fields
Click "Verify Information" to see comparison results

API Endpoints

POST /api/extract

Extract text and structured fields from uploaded image/PDF.

Request:

Method: POST
Content-Type: multipart/form-data
Body: file (image or PDF file)

Response:

{
  "success": true,
  "message": "OCR extraction successful",
  "rawText": "Full extracted text...",
  "extractedFields": {
    "name": "John Doe",
    "dob": "01/15/1990",
    "id_number": "AB123456789",
    "address": "123 Main Street, New York, NY 10001"
  }
}

POST /api/verify

Verify user-entered form data against OCR extracted data.

Request:

Method: POST
Content-Type: multipart/form-data
Body:
- file: Original uploaded image file
- formData: JSON string containing form data map

Response:

{
  "success": true,
  "message": "Verification completed",
  "fieldResults": {
    "name": {
      "match": true,
      "confidence": 0.95,
      "extractedValue": "John Doe",
      "userValue": "John Doe"
    },
    "dob": {
      "match": false,
      "confidence": 0.72,
      "extractedValue": "01/15/1990",
      "userValue": "01/16/1990"
    }
  },
  "overallConfidence": 0.835
}

Configuration

Backend Configuration (`application.properties`)

# Python executable path
python.executable=python

# Path to OCR script (relative to project root)
python.ocr.script=python-ocr/ocr_processor.py

# Timeout for Python script execution (milliseconds)
python.ocr.timeout=30000

# File upload size limit
spring.servlet.multipart.max-file-size=10MB

Frontend Configuration (`app.js`)

// Backend API URL
const API_BASE_URL = 'http://localhost:8080/api';

Troubleshooting

Python OCR Script Not Found

Error: Python OCR script failed with exit code: 1

Solution:

Verify Python script path in application.properties
Ensure script is executable: chmod +x python-ocr/ocr_processor.py (Linux/macOS)
Test script manually: python python-ocr/ocr_processor.py <image_path>

TrOCR Model Download Issues

Error: Error loading TrOCR model

Solution:

Ensure internet connection for first-time model download
Model will be cached in ~/.cache/huggingface/ after first download
Check available disk space (model is ~500MB)

Port Already in Use

Error: Port 8080 is already in use

Solution:

Change port in application.properties: server.port=8081
Update frontend API_BASE_URL accordingly
Or stop the process using port 8080

CORS Errors

Error: Access to fetch at 'http://localhost:8080/api/extract' from origin 'http://localhost:8000' has been blocked by CORS policy

Solution:

Backend already allows all origins via @CrossOrigin(origins = "*")
If issue persists, verify frontend is accessing correct backend URL

File Upload Size Limit

Error: Maximum upload size exceeded

Solution:

Increase limit in application.properties:

spring.servlet.multipart.max-file-size=20MB
spring.servlet.multipart.max-request-size=20MB

Field Extraction Patterns

The field_extractor.py uses regex patterns to extract:

Name: Full names (2+ words, capitalized)
DOB: Dates in MM/DD/YYYY, DD/MM/YYYY, or YYYY-MM-DD format
ID Number: Alphanumeric IDs (6+ characters)
Address: Street addresses with city/state/zip

You can customize patterns in python-ocr/field_extractor.py for your specific document types.

Performance Notes

First OCR run: Slow (~10-30 seconds) due to model loading
Subsequent runs: Faster (~2-5 seconds) as model stays in memory
GPU acceleration: Automatically used if CUDA is available
Memory usage: ~2-3GB RAM for TrOCR model

Development

Adding New Fields

Update field_extractor.py with new regex patterns
Update frontend form (index.html) with new input fields
Update DTOs if needed

Customizing Verification Threshold

Edit VerificationService.java:

boolean match = confidence >= 0.85; // Change threshold here

License

This project uses open-source libraries:

Spring Boot (Apache License 2.0)
TrOCR (MIT License)
PyTorch (BSD-style License)

Support

For issues or questions:

Check logs in Spring Boot console
Verify Python script runs independently
Test API endpoints with Postman/curl
Check browser console for frontend errors

Next Steps

Add PDF text extraction support
Implement batch processing
Add database storage for extracted data
Enhance field extraction with ML models
Add user authentication

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
backend		backend
frontend		frontend
python-ocr		python-ocr
.gitignore		.gitignore
PROJECT_SUMMARY.md		PROJECT_SUMMARY.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

OCR Document Extraction & Verification Application

Architecture

Project Structure

Prerequisites

Required Software

System Requirements

Installation & Setup

Step 1: Install Python Dependencies

Step 2: Configure Spring Boot Application

Step 3: Setup Frontend

Running the Application

Step 1: Start Spring Boot Backend

Step 2: Serve Frontend

Step 3: Access Application

API Endpoints

POST /api/extract

POST /api/verify

Configuration

Backend Configuration (application.properties)

Frontend Configuration (app.js)

Troubleshooting

Python OCR Script Not Found

TrOCR Model Download Issues

Port Already in Use

CORS Errors

File Upload Size Limit

Field Extraction Patterns

Performance Notes

Development

Adding New Fields

Customizing Verification Threshold

License

Support

Next Steps

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Backend Configuration (`application.properties`)

Frontend Configuration (`app.js`)

Packages