ParseFlow is an AI-powered document intelligence platform that classifies, stores, and organizes uploaded files (PDF, JPG, JPEG, PNG) into structured folders with searchable history and feedback-driven improvements.
- Hybrid classification path with confidence-based fallback
- Category-aware storage (
Identity,Financial,Legal,Compliance,Tax,Business,Other) - Auto folder creation by category and document type
- History and document explorer with category filtering and search
- Feedback flow for top history item (
Correct/Wrong) - Optional Google Drive auto-sync
- Security extensions: auth middleware, encrypted extracted payload storage, file hash integrity, secure file access route
flowchart TD
A[Upload Document] --> B[Backend /upload]
B --> C{File Type}
C -->|PDF| D[Convert PDF to Images\nup to 3 pages]
D --> E[Vision LLM Classification\nper page]
E --> F{Known Doc Type?}
F -->|Yes| G[Use Vision Result]
F -->|No| H[Fallback Unknown/Other]
C -->|Image| I[ML Classifier]
I --> J{Confidence >= Threshold}
J -->|Yes| K[Use ML Result]
J -->|No| L[Vision LLM Fallback]
G --> M[Derive Category + DocType]
H --> M
K --> M
L --> M
M --> N[Persist File to Storage\nstorage/userId/category/docType/file]
N --> O[Persist Metadata to MongoDB]
O --> P[Security Layer\nfileHash + encryptedData]
P --> Q[Optional Google Drive Sync]
Q --> R[API Response]
R --> S[Frontend: Documents + History + Feedback]
S --> T[Feedback Submitted\nCorrect/Wrong]
T --> U[Next Top History Document]
- Frontend: React + Vite + TypeScript
- Backend: Node.js + Express + MongoDB
- Classification: ML first, Vision LLM fallback for low confidence/unknown cases
- Storage: Local file system under user/category/doc-type path
- Auth: Clerk token verification (plus
x-user-idcompatibility path)
parseflow_main/
backend/ # Express API, classification orchestration, storage sync, notifications
frontend/ # React/Vite UI
ml-service/ # Python services for model/OCR-related tasks
storage/ # Persisted documents
From repository root:
npm --prefix "parseflow_main/backend" install
npm --prefix "parseflow_main/frontend" installBackend env:
- Copy
parseflow_main/backend/.env.exampletoparseflow_main/backend/.env - Set required values (
CLERK_SECRET_KEY,MONGO_URI,GROQ_API_KEY,DATA_ENCRYPTION_KEY, etc.)
Frontend env:
- Set
VITE_BACKEND_URLinparseflow_main/frontend/.env - Local default is usually
http://localhost:5000
Backend:
cd "parseflow_main/backend"
npm startFrontend:
cd "parseflow_main/frontend"
npm run devIf port 5000 is already in use, stop the existing process first, then restart backend.
- Middleware auth gate for protected routes
- Extracted payload encryption before DB write (
encryptedData) - SHA-256 integrity hash per stored file (
fileHash) - Secure file route with user-level access validation for authenticated requests
- Env-based secret handling (
DATA_ENCRYPTION_KEYand related credentials)
- Backend now preserves explicit
Compliancecategory from model output. - Compliance docs are stored under
Compliance/..., not forced toOther. - UI category rendering prefers:
storage.categoryclassification.categorycategory- fallback
Other
$conn = Get-NetTCPConnection -LocalPort 5000 -State Listen -ErrorAction SilentlyContinue
if ($conn) { Stop-Process -Id $conn.OwningProcess -Force }
cd "parseflow_main/backend"
npm start- Ensure backend is running latest code (not stale node process)
- Re-upload document after backend restart
- Hard refresh frontend browser tab
- Verify
VITE_BACKEND_URLpoints to active backend
For full onboarding and operations details, see:
Built for Agentica 2.0 Hackathon