Chat with your PDFs using AI. This project uses Gemini 1.5 to answer questions based on your uploaded documents with high accuracy and cited sources.
- 📤 Upload: Your PDF is chopped into small, smart pieces to maintain context.
- 💾 Store: Those pieces are converted into "embeddings" (numerical representations) and saved in a Vector Database.
- 💬 Ask: When you ask a question, the AI finds the most relevant pieces and writes a clear answer based only on them.
- Smart Document Processing: Layout-aware PDF handling with semantic chunking for zero-context loss.
- Vector Database (ChromaDB): Optimized storage for sub-second document retrieval and semantic ranking.
- Strict Context Grounding: Gemini 1.5 Flash is forced to answer only from your uploaded context—eliminating AI hallucinations.
- Modern Dark Mode UI: A sleek, minimal dashboard built with Next.js 14 and Glassmorphism Styling.
- Interactive Citations: Provides the exact filename and page number for every AI-generated claim.
graph TD
A[Browser: Next.js Dashboard] -- "Query (JSON)" --> B[FastAPI: Intelligence Bridge]
B -- "1. Cleanse Input" --> C{Neural Core}
C -- "2. Vector Search" --> D[Vector Store: ChromaDB]
D -- "3. Relevant Segments" --> C
C -- "4. Reason with Context" --> E[AI: Gemini 1.5 Flash]
E -- "5. Formulate Citation" --> B
B -- "6. Verified Answer" --> A
documind-ai/
├── backend/
│ ├── app/
│ │ ├── services/
│ │ │ ├── ingestion.py # PDF parsing logic
│ │ │ └── rag_service.py # AI reasoning & retrieval
│ │ ├── main.py # API endpoints
│ │ └── models.py # Data schemas
│ ├── Dockerfile # Backend container config
│ └── requirements.txt # Python dependencies
│
├── frontend/
│ ├── app/ # Next.js 14 Dashboard
│ ├── globals.css # Premium styles
│ ├── package.json # NPM manifest
│ └── tailwind.config.ts # UI theme config
│
├── docker-compose.yml # Multi-service setup (Local)
├── .env.example # Environmental template
└── README.md # Project documentation
- Node.js 20+ and Python 3.11+
- Google AI API Key: Get it at Google AI Studio
When deploying your Backend to Railway, ensure you add the following Variables:
GOOGLE_API_KEY: Your key from Google AI Studio.CHROMA_DB_PATH: Set to/app/vector_storefor production persistence.PORT:9000
🚀 Persistence: In your Railway Dashboard, go to Settings -> Volumes and mount a volume at /app/vector_store. This ensures your indexed documents are not lost after an app restart.
- PDF Parsing Errors: If a PDF is a scanned image, ensure you are using a text-based PDF or use an OCR library like
PyMuPDF. - Port Conflicts: If Port 3000 is taken, the frontend will fail to start. Change the port in
docker-compose.ymlif needed. - Backend Connection: If the AI doesn't respond, ensure the
NEXT_PUBLIC_API_URLin your.envmatches your running backend port.
Stack: FastAPI · LangChain v0.2 · Gemini 1.5 · Next.js 14 · ChromaDB · Tailwind CSS · Docker
Developed as a clean, accessible solution for semantic document research.