A production-ready RAG platform featuring Docling parsing, Milvus Hybrid Search, LangGraph orchestration
DocNexus is a full-stack Retrieval Augmented Generation system designed for complex document handling. It moves beyond simple text splitting by understanding document layout (tables, images) and orchestrating retrieval with a self-correcting agentic workflow.
- 📄 Advanced Ingestion: Uses Docling for layout-aware parsing (PDFs, Images, Tables).
- 🧠 Adaptive RAG: Powered by LangGraph, featuring query rewriting, document grading, and fallback loops.
- 🔎 Hybrid Search: Utilizes Milvus for Sparse (BM25) + Dense (Embedding) retrieval.
- ⚡ Reranking: Includes Flashrank reranking for high-precision context.
- 🖥️ Reasoning UI: A React frontend that visualizes the AI's "Thinking Process," tool usage, and source citations.
- ⚙️ Async Processing: Celery + Redis pipeline for handling large file uploads without blocking.
- Tech Stack
- How to Run
- System Architecture
- Database Strategy
- Chunking Strategy
- Frontend Overview
- API Documentation
- Roadmap
| Component | Technology | Description |
|---|---|---|
| Backend | FastAPI | High-performance async API server |
| Orchestration | LangGraph | State machine for RAG loops and agent logic |
| Vector DB | Milvus | Hybrid Search (Dense + Sparse) storage |
| Database | PostgreSQL | User auth, file metadata, and chat history |
| Queue | Celery + Redis | Background task processing for document ingestion |
| Parsing | Docling | Layout-aware document conversion |
| Frontend | React + Vite | Responsive UI with Tailwind & Shadcn/ui |
| LLM | Ollama / OpenRouter | Configurable model provider |
| Tracking | pgAdmin, Flower, Attu | UIs for Postgres, Celery and Milvus |
-
Clone the repository
git clone https://github.com/Fenaz12/DocNexus-1.0.git
-
Environment Setup Copy the example environment file and fill in your API keys (OpenRouter).
cd DocNexus-1.0cp app/.env.example .env
-
Start Services (Docker) Spin up the infrastructure (Milvus, Postgres, Redis, API, Worker).
docker-compose up -d --build
-
Access the Application
- Frontend:
http://localhost:5173(or configured port) - API Docs:
http://localhost:8000/docs - Milvus UI:
http://localhost:8000/attu(if configured)
- Frontend:
We do not simply split text by character count. The ingestion pipeline (ingest.py & tasks.py) ensures structure is preserved.
- Upload: User uploads PDF/Docx via API.
- Queue: File is saved locally; a Celery task is dispatched.
- Docling Parse: The document is converted to Markdown, identifying headers, paragraphs, and tables.
- Hybrid Chunking:
- Tables are preserved as semantic units.
- Text is chunked respecting semantic boundaries (HybridChunker).
- Vectorization:
- Dense Vectors: Generated via OpenAI or Ollama Embeddings.
- Sparse Vectors: Generated via BM25 algorithm within Milvus.
- Storage: Vectors pushed to Milvus; metadata updated in Postgres.
Instead of a linear "Search -> Answer" flow, DocNexus uses a state graph (graph.py):
- Router: Decides if the question needs retrieval or if it's general conversation.
- Retrieve: Fetches documents using Hybrid Search + Reranking.
- Grade: An LLM evaluates if the retrieved documents actually answer the question.
- If Yes: Generate Answer.
- If No: Rewrite Question and loop back to retrieval.
- Generate: Synthesizes the final answer with citations.
We chose Milvus over Qdrant or Chroma for this specific implementation because of its robust support for Hybrid Search (Dense + Sparse) with customizable parameters.
- Dense Index:
HNSW(High performance, recall). - Sparse Index:
SPARSE_INVERTED_INDEX(BM25). - Why? Standard embedding search often fails on specific keyword matches (e.g., specific part numbers or acronyms). BM25 covers this gap.
Unlike the example using MongoDB, DocNexus utilizes PostgreSQL (dbservice.py).
- Reasoning: Since we have structured relationships (Users -> Files -> Chunks), a relational DB offers better integrity.
- Pool: We use
psycopg_poolfor async high-concurrency connections.
Standard chunking breaks tables and context. DocNexus uses Docling + Hybrid Chunking:
- Table Preservation: Tables are detected and kept intact. If a table is too large, it is split row-wise while preserving the Header row for every chunk (
vector_store.py). - Semantic Merging: Small chunks are merged into previous paragraphs to avoid "orphan" sentences.
- Metadata Enrichment: Every chunk carries
file_id,page_number, andsourcemetadata for precise citations.
The frontend (Chat.jsx) is built to build trust with the user.
- Thinking Block: We stream the "Thinking" tokens from the LLM. Users can expand this block to see the logic (e.g., "I need to search for X...").
- Tool Visualization: When the
retrievetool is called, the UI shows a badge indicating a tool was used and its status. - Sidebar:
- Left: Chat History (Session management).
- Right: File Management (Upload status, partitioning stats, viewing recognized chunks).
| Endpoint | Method | Description |
|---|---|---|
| Authentication | ||
/auth/register |
POST | Register new user |
/auth/login |
POST | Login and get access token |
/auth/me |
GET | Get current user details |
| Ingestion | ||
/files/upload/ |
POST | Create Upload Files |
/files/task/{task_id} |
GET | Get Task Status |
/files/processing-history/ |
GET | Get Processing History |
/files/user-files/ |
GET | Get User Files |
/files/ |
GET | Get User Files With Metadata |
/files/{filename}/metadata |
GET | Get File Metadata |
/files/chunks/ |
GET | Get User Chunks |
/files/{file_id}/chunks |
GET | Get File Chunks |
| Chat | ||
/chat/ |
POST | Chat Endpoint |
/chat/history |
GET | Get History |
/chat/{thread_id} |
GET | Get Thread Messages |
- Implement Knowledge Graph extraction alongside Vector search and compare with current implementation.
- Do Late Chunking and compare the results with the current implementation.
- Add Ragas/Arias integration to benchmark retrieval quality.
- Document metadata population in Milvus, Summarize docs for better retreival
- Show images in the chat
Built with ❤️ using FastAPI, React, and LangGraph.
