A full-stack asynchronous document processing application built with FastAPI, Next.js, Celery, Redis, and PostgreSQL.
- Document Upload: Multi-file PDF upload with pre-signed S3 URLs.
- Async Processing: Background PDF parsing and extraction using Celery workers.
- Real-time Progress: Live status updates via WebSocket (backed by Redis Pub/Sub).
- Interactive Dashboard: Search, filter, and monitor all document processing jobs.
- Content Review: Edit extracted metadata, keywords, and summaries.
- Data Export: Export processed results in JSON and CSV formats.
- Authentication: Secure Google OAuth and JWT-based auth.
- Frontend: Next.js, TypeScript, Tailwind CSS, Zustand, Lucide React.
- Backend: Python, FastAPI, SQLModel (ORM), Alembic (Migrations).
- Task Queue: Celery with Redis as the Broker.
- Real-time: Redis Pub/Sub + WebSockets.
- Storage: AWS S3 for document persistence.
- Database: PostgreSQL (Neon.tech).
- Python 3.10+
- Node.js 18+
- Redis (running locally or via Docker)
- PostgreSQL database
- AWS S3 Bucket
- Navigate to the
backenddirectory. - Create a virtual environment:
python -m venv venv. - Activate the venv:
source venv/bin/activate. - Install dependencies:
pip install -r requirements.txt. - Configure
.env(use the provided template or existing.env). - Run migrations:
alembic upgrade head. - Start the server:
python -m uvicorn app.main:app --reload. - Start the worker:
celery -A app.worker.celery_app worker --loglevel=info.
- Navigate to the
frontenddirectory. - Install dependencies:
npm install. - Start the dev server:
npm run dev. - Open http://localhost:3000 in your browser.
- Initiation: The user selects files. Frontend calls
/upload/initiate, which records the task in DB and returns S3 presigned URLs. - Upload: Frontend uploads files directly to S3.
- Trigger: Frontend calls
/upload/complete, which triggers a Celery background task. - Worker: The Celery worker downloads the file from S3, parses text using
pypdf, extracts metadata, and updates the database. - Updates: Throughout the process, the worker publishes events to Redis. A dedicated WebSocket endpoint in FastAPI listens to these events and streams them to the client.
- Review: Once complete, the user reviews data, can edit fields, and finally "Finalizes" the record (making it read-only).
- WebSocket Broadcasting: For this version, WebSocket updates are broadcasted to all connected clients. In a multi-tenant production environment, we would implement per-user channel filtering.
- Single PDF per Row: While the backend supports multi-file Tasks, the current UI treats each file as a primary "Job" row for better individual tracking.
This project was developed with the assistance of Antigravity (Google DeepMind's AI coding assistant) for architecture design, API implementation, and UI development.
You can find sample PDF files used for testing in the /samples directory (if provided).