Skip to content

Partha0003/OCR_PROJECT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

OCR Document Extraction & Verification Application

A complete web application for extracting structured information from document images using OCR and verifying user-entered data against extracted values.

Architecture

  • Frontend: HTML, CSS, JavaScript (vanilla)
  • Backend: Spring Boot (Java) REST API
  • OCR Engine: Python + Microsoft TrOCR (PyTorch)
  • Communication: Spring Boot executes Python scripts locally via ProcessBuilder

Project Structure

OCR_PROJECT/
├── backend/                          # Spring Boot application
│   ├── src/main/java/com/ocr/
│   │   ├── OcrApplication.java       # Main Spring Boot app
│   │   ├── config/
│   │   │   └── PythonConfig.java     # Python execution config
│   │   ├── controller/
│   │   │   └── OcrController.java    # REST endpoints
│   │   ├── service/
│   │   │   ├── OcrService.java       # OCR extraction service
│   │   │   └── VerificationService.java  # Data verification service
│   │   └── dto/                      # Data Transfer Objects
│   ├── src/main/resources/
│   │   └── application.properties    # Application configuration
│   └── pom.xml                       # Maven dependencies
│
├── python-ocr/                       # Python OCR module
│   ├── ocr_processor.py             # TrOCR model wrapper
│   ├── field_extractor.py            # Regex-based field extraction
│   └── requirements.txt              # Python dependencies
│
└── frontend/                         # Frontend web application
    ├── index.html                    # Main HTML page
    ├── style.css                     # Styling
    └── app.js                        # JavaScript logic

Prerequisites

Required Software

  1. Java Development Kit (JDK) 17 or higher

  2. Maven 3.6+

  3. Python 3.8+

  4. pip (Python package manager)

    • Usually comes with Python
    • Verify: pip --version

System Requirements

  • RAM: Minimum 4GB (8GB+ recommended for TrOCR model)
  • Storage: ~2GB free space for Python dependencies and models
  • OS: Windows, Linux, or macOS

Installation & Setup

Step 1: Install Python Dependencies

  1. Navigate to the python-ocr directory:

    cd python-ocr
  2. Create a virtual environment (recommended):

    python -m venv venv
  3. Activate virtual environment:

    • Windows:
      venv\Scripts\activate
    • Linux/macOS:
      source venv/bin/activate
  4. Install Python dependencies:

    pip install -r requirements.txt

    Note: This will download PyTorch and TrOCR model (~1.5GB). The first run will also download the TrOCR model weights.

  5. Test Python OCR script:

    python ocr_processor.py <path_to_test_image>

Step 2: Configure Spring Boot Application

  1. Navigate to the backend directory:

    cd backend
  2. Edit src/main/resources/application.properties:

    • Update python.executable if your Python command is python3 instead of python
    • Verify python.ocr.script path is correct relative to project root
  3. Build the Spring Boot application:

    mvn clean install

    This will download all Maven dependencies and compile the project.

Step 3: Setup Frontend

The frontend files are static HTML/CSS/JS files. No build process required.

Important: Update the API URL in frontend/app.js if your backend runs on a different port:

const API_BASE_URL = 'http://localhost:8080/api';

Running the Application

Step 1: Start Spring Boot Backend

  1. Navigate to backend directory:

    cd backend
  2. Run Spring Boot application:

    mvn spring-boot:run

    Or run the JAR file:

    java -jar target/ocr-application-1.0.0.jar
  3. Verify backend is running:

    • Open browser: http://localhost:8080
    • You should see a Spring Boot error page (expected, as there's no root endpoint)
    • Check logs for: "Started OcrApplication"

Step 2: Serve Frontend

You can serve the frontend in several ways:

Option A: Using Python HTTP Server (simplest)

cd frontend
python -m http.server 8000

Option B: Using Node.js http-server

npx http-server frontend -p 8000

Option C: Using any web server

  • Copy frontend/ contents to your web server directory
  • Ensure CORS is enabled (backend already allows all origins)

Step 3: Access Application

  1. Open browser: http://localhost:8000
  2. Upload a document image (JPG, PNG, or PDF)
  3. Click "Extract Information"
  4. Review and edit extracted fields
  5. Click "Verify Information" to see comparison results

API Endpoints

POST /api/extract

Extract text and structured fields from uploaded image/PDF.

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body: file (image or PDF file)

Response:

{
  "success": true,
  "message": "OCR extraction successful",
  "rawText": "Full extracted text...",
  "extractedFields": {
    "name": "John Doe",
    "dob": "01/15/1990",
    "id_number": "AB123456789",
    "address": "123 Main Street, New York, NY 10001"
  }
}

POST /api/verify

Verify user-entered form data against OCR extracted data.

Request:

  • Method: POST
  • Content-Type: multipart/form-data
  • Body:
    • file: Original uploaded image file
    • formData: JSON string containing form data map

Response:

{
  "success": true,
  "message": "Verification completed",
  "fieldResults": {
    "name": {
      "match": true,
      "confidence": 0.95,
      "extractedValue": "John Doe",
      "userValue": "John Doe"
    },
    "dob": {
      "match": false,
      "confidence": 0.72,
      "extractedValue": "01/15/1990",
      "userValue": "01/16/1990"
    }
  },
  "overallConfidence": 0.835
}

Configuration

Backend Configuration (application.properties)

# Python executable path
python.executable=python

# Path to OCR script (relative to project root)
python.ocr.script=python-ocr/ocr_processor.py

# Timeout for Python script execution (milliseconds)
python.ocr.timeout=30000

# File upload size limit
spring.servlet.multipart.max-file-size=10MB

Frontend Configuration (app.js)

// Backend API URL
const API_BASE_URL = 'http://localhost:8080/api';

Troubleshooting

Python OCR Script Not Found

Error: Python OCR script failed with exit code: 1

Solution:

  1. Verify Python script path in application.properties
  2. Ensure script is executable: chmod +x python-ocr/ocr_processor.py (Linux/macOS)
  3. Test script manually: python python-ocr/ocr_processor.py <image_path>

TrOCR Model Download Issues

Error: Error loading TrOCR model

Solution:

  1. Ensure internet connection for first-time model download
  2. Model will be cached in ~/.cache/huggingface/ after first download
  3. Check available disk space (model is ~500MB)

Port Already in Use

Error: Port 8080 is already in use

Solution:

  1. Change port in application.properties: server.port=8081
  2. Update frontend API_BASE_URL accordingly
  3. Or stop the process using port 8080

CORS Errors

Error: Access to fetch at 'http://localhost:8080/api/extract' from origin 'http://localhost:8000' has been blocked by CORS policy

Solution:

  • Backend already allows all origins via @CrossOrigin(origins = "*")
  • If issue persists, verify frontend is accessing correct backend URL

File Upload Size Limit

Error: Maximum upload size exceeded

Solution:

  • Increase limit in application.properties:
    spring.servlet.multipart.max-file-size=20MB
    spring.servlet.multipart.max-request-size=20MB

Field Extraction Patterns

The field_extractor.py uses regex patterns to extract:

  • Name: Full names (2+ words, capitalized)
  • DOB: Dates in MM/DD/YYYY, DD/MM/YYYY, or YYYY-MM-DD format
  • ID Number: Alphanumeric IDs (6+ characters)
  • Address: Street addresses with city/state/zip

You can customize patterns in python-ocr/field_extractor.py for your specific document types.

Performance Notes

  • First OCR run: Slow (~10-30 seconds) due to model loading
  • Subsequent runs: Faster (~2-5 seconds) as model stays in memory
  • GPU acceleration: Automatically used if CUDA is available
  • Memory usage: ~2-3GB RAM for TrOCR model

Development

Adding New Fields

  1. Update field_extractor.py with new regex patterns
  2. Update frontend form (index.html) with new input fields
  3. Update DTOs if needed

Customizing Verification Threshold

Edit VerificationService.java:

boolean match = confidence >= 0.85; // Change threshold here

License

This project uses open-source libraries:

  • Spring Boot (Apache License 2.0)
  • TrOCR (MIT License)
  • PyTorch (BSD-style License)

Support

For issues or questions:

  1. Check logs in Spring Boot console
  2. Verify Python script runs independently
  3. Test API endpoints with Postman/curl
  4. Check browser console for frontend errors

Next Steps

  • Add PDF text extraction support
  • Implement batch processing
  • Add database storage for extracted data
  • Enhance field extraction with ML models
  • Add user authentication

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors