Extract structured data from PDF documents using Google Gemini AI.
This API service leverages Google's Gemini multimodal AI to extract structured data from PDF documents without requiring predefined schemas. Gemini can automatically analyze both text and visual elements in PDFs and extract them into structured JSON format.
Key features:
- Simple PDF upload and processing
- Dynamic data extraction without predefined schemas
- Uses Gemini's visual understanding capabilities
- Modern, user-friendly frontend interface
The application follows a simple design:
pdf-to-structured-data/
├── app/
│ ├── api/
│ │ ├── routes/
│ │ │ └── pdf_routes.py # API endpoints for PDF processing
│ ├── services/
│ │ └── gemini_service.py # Gemini API integration
│ ├── utils/
│ │ └── file_utils.py # File handling utilities
│ ├── config.py # Configuration settings
│ └── main.py # FastAPI application entry point
├── frontend/
│ ├── index.html # Frontend UI
│ ├── scripts.js # Frontend JavaScript
│ └── README.md # Frontend documentation
├── temp_uploads/ # Temporary storage for PDFs
├── requirements.txt
└── README.md
- Upload: PDF documents are uploaded through the API or frontend
- Process: PDF is sent to Gemini with a carefully crafted prompt
- Extract: Gemini analyzes the document and extracts structured data
- Return: Structured JSON data is returned to the client
- Python 3.9+
- Google Gemini API key
- Clone the repository:
git clone https://github.com/SlyyCooper/pdf-to-structured-data.git
cd pdf-to-structured-data- Install dependencies:
pip install -r requirements.txt- Create a
.envfile with your Gemini API key:
GOOGLE_API_KEY=your_api_key_here
uvicorn app.main:app --reloadThe API will be available at http://localhost:8000
Open frontend/index.html in a web browser, or serve it using a static file server.
For example, using Python's built-in HTTP server:
cd frontend
python -m http.server 8080Then open http://localhost:8080 in your browser.
POST /api/v1/upload
- Request: Multipart form with file
- Response: JSON with file ID
POST /api/v1/extract/{file_id}
- Path Parameters:
file_id: ID of the uploaded file
- Request Body (optional):
document_type: Hint about the document type (optional)
- Response: JSON with extracted structured data
GET /api/v1/health
- Response: API health status
curl -X POST -F "file=@invoice.pdf" http://localhost:8000/api/v1/uploadResponse:
{
"file_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6.pdf",
"filename": "invoice.pdf",
"size_bytes": 12345
}curl -X POST -H "Content-Type: application/json" -d '{"document_type": "invoice"}' http://localhost:8000/api/v1/extract/3fa85f64-5717-4562-b3fc-2c963f66afa6.pdfResponse:
{
"document_type": "invoice",
"vendor": {
"name": "ACME Corp",
"address": "123 Main St, Anytown, USA"
},
"invoice_number": "INV-12345",
"date": "2023-06-01",
"items": [
{
"description": "Product A",
"quantity": 2,
"unit_price": 49.99,
"amount": 99.98
}
],
"subtotal": 99.98,
"tax": 8.00,
"total": 107.98
}The application includes a modern, user-friendly frontend designed with an Apple-inspired aesthetic. Key features:
- Drag & drop PDF uploads
- Clear visual feedback during processing
- Formatted JSON display with syntax highlighting
- One-click copy to clipboard
- Responsive design for mobile and desktop
For more details, see the Frontend README.