Storix ---> Async Document Processing Workflow System

A full-stack asynchronous document processing application built with FastAPI, Next.js, Celery, Redis, and PostgreSQL.

Features

Document Upload: Multi-file PDF upload with pre-signed S3 URLs.
Async Processing: Background PDF parsing and extraction using Celery workers.
Real-time Progress: Live status updates via WebSocket (backed by Redis Pub/Sub).
Interactive Dashboard: Search, filter, and monitor all document processing jobs.
Content Review: Edit extracted metadata, keywords, and summaries.
Data Export: Export processed results in JSON and CSV formats.
Authentication: Secure Google OAuth and JWT-based auth.

Tech Stack

Frontend: Next.js, TypeScript, Tailwind CSS, Zustand, Lucide React.
Backend: Python, FastAPI, SQLModel (ORM), Alembic (Migrations).
Task Queue: Celery with Redis as the Broker.
Real-time: Redis Pub/Sub + WebSockets.
Storage: AWS S3 for document persistence.
Database: PostgreSQL (Neon.tech).

Setup Instructions

Prerequisites

Python 3.10+
Node.js 18+
Redis (running locally or via Docker)
PostgreSQL database
AWS S3 Bucket

Backend Setup

Navigate to the backend directory.
Create a virtual environment: python -m venv venv.
Activate the venv: source venv/bin/activate.
Install dependencies: pip install -r requirements.txt.
Configure .env (use the provided template or existing .env).
Run migrations: alembic upgrade head.
Start the server: python -m uvicorn app.main:app --reload.
Start the worker: celery -A app.worker.celery_app worker --loglevel=info.

Frontend Setup

Navigate to the frontend directory.
Install dependencies: npm install.
Start the dev server: npm run dev.
Open http://localhost:3000 in your browser.

Architecture Overview

Initiation: The user selects files. Frontend calls /upload/initiate, which records the task in DB and returns S3 presigned URLs.
Upload: Frontend uploads files directly to S3.
Trigger: Frontend calls /upload/complete, which triggers a Celery background task.
Worker: The Celery worker downloads the file from S3, parses text using pypdf, extracts metadata, and updates the database.
Updates: Throughout the process, the worker publishes events to Redis. A dedicated WebSocket endpoint in FastAPI listens to these events and streams them to the client.
Review: Once complete, the user reviews data, can edit fields, and finally "Finalizes" the record (making it read-only).

Assumptions & Tradeoffs

WebSocket Broadcasting: For this version, WebSocket updates are broadcasted to all connected clients. In a multi-tenant production environment, we would implement per-user channel filtering.
Single PDF per Row: While the backend supports multi-file Tasks, the current UI treats each file as a primary "Job" row for better individual tracking.

AI Tools Used

This project was developed with the assistance of Antigravity (Google DeepMind's AI coding assistant) for architecture design, API implementation, and UI development.

Sample Files

You can find sample PDF files used for testing in the /samples directory (if provided).

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
backend		backend
frontend		frontend
samples		samples
README.md		README.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Storix ---> Async Document Processing Workflow System

Features

Tech Stack

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Architecture Overview

Assumptions & Tradeoffs

AI Tools Used

Sample Files

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Storix ---> Async Document Processing Workflow System

Features

Tech Stack

Setup Instructions

Prerequisites

Backend Setup

Frontend Setup

Architecture Overview

Assumptions & Tradeoffs

AI Tools Used

Sample Files

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages