Skip to content

hericlesferraz/DocVault

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

49 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DocVault

Intelligent Document RAG with exact citation extraction.

Upload PDFs, DOCX, PPTX, or images → ask questions → get answers with precise citations linked to source pages.

Python FastAPI React TypeScript Qdrant PostgreSQL Docker

Lint Backend Tests Frontend Tests Docker Build Security Audit


Screenshots

Chat interface with exact citations and PDF viewer
Chat with exact citations linked to source pages — click a citation to highlight the passage in the PDF viewer

Citation highlighting in PDF
Citation bounding boxes rendered directly on the PDF page for precise source verification

Knowledge Graph visualization
Interactive knowledge graph — UMAP projection of document chunks with clustering and similarity-based edges

Admin panel with user management
Admin panel — user management with role-based access control (viewer, editor, admin)

Project Metrics

Backend 14,300+ lines of Python across 50+ modules
Frontend 7,200+ lines of TypeScript / React
Tests 70 test files (55 backend + 15 frontend) with 15,100+ lines
LLM Prompts 15 prompt templates (zero hardcoded in code)
Documentation 2,400+ lines across guides, ADRs, and API reference
Infrastructure 9 Docker configs, 4 deployment modes, 3 Grafana dashboards

Development Process

This project was built using AI-assisted development — a spec-driven workflow where a human architect defines the system design and AI agents implement it under review.

How it works:

  1. Human defines specs — Each of the 50 phases has a detailed specification in .ralph/specs/ covering requirements, architecture decisions, testing criteria, and rollout order
  2. Agents implement — AI coding agents (Ralph for initial phases, Claude Code for refinement and multi-agent workflows) read the spec and implement code, tests, and documentation
  3. Human reviews and iterates — Every phase goes through review for correctness, security, and architectural consistency before being marked complete

Agent orchestration artifacts:

  • .ralph/ — Phase specs and development roadmap (50 phases, all completed)
  • .claude/agents/ — Custom specialized subagents (RAG reviewer, security auditor, API consistency checker, test writer)
  • .claude/rules/ — Domain-specific conventions enforced across agent sessions
  • .claude/skills/ — Reusable slash commands for deployment, testing, evaluation, and backups

What this demonstrates:

  • Ability to decompose a complex system into 50 well-scoped, sequential phases — each producing a working, testable artifact
  • Technical judgment — the human decides architecture (async-first, LiteLLM abstraction, embedding provider protocol, ML service extraction), the agent executes
  • Multi-agent orchestration — parallel specialized agents (security audit, RAG review, API consistency, test writing) coordinating on the same codebase via isolated worktrees

Features

Feature Description
Exact Citations RAG with source mapping Answers include verbatim quoted passages with page numbers and bounding box coordinates, rendered as highlights in the PDF viewer
Multi-Modal Figures + vision Extracts figures from documents and describes them via Ollama vision models (Gemma 3), indexed as searchable chunks
Agentic Search Deep multi-step retrieval Decomposes complex queries into sub-questions, iterates with self-verification for multi-hop reasoning
Knowledge Graph Visual exploration Interactive UMAP-based visualization of document chunk relationships with clustering and similarity filtering
Semantic Cache Smart deduplication Per-user query caching with cosine similarity matching and automatic invalidation on new uploads
Multi-Format PDF, DOCX, PPTX, images Format-specific parsing with OCR for scanned documents and searchable PDF generation
Streaming Chat Real-time responses SSE-based streaming with markdown rendering, session management, and export (Markdown/PDF)
Guardrails Safety + quality Hallucination detection (NLI grounding), prompt injection defense, confidence scoring, input sanitization
Air-Gapped Fully offline Works entirely offline with local models via Ollama — zero external API calls
Observability Full-stack monitoring Prometheus + Grafana dashboards, Langfuse LLM tracing, cost tracking, and quality metrics

Architecture

graph TB
    subgraph Frontend
        UI[React / TypeScript / Vite]
    end

    subgraph Edge
        Caddy[Caddy — TLS + Reverse Proxy]
    end

    subgraph Backend
        API[FastAPI]
        RAG[RAG Engine + Citations]
        Agent[Agentic Retrieval]
        Guard[Guardrails Layer]
        Ingest[Ingestion Pipeline]
    end

    subgraph Storage
        PG[(PostgreSQL)]
        QD[(Qdrant)]
        FS[File Storage]
    end

    subgraph ML["ML Service (optional)"]
        Embed[Local Embeddings]
        OCR[docTR Neural OCR]
        Rerank[Cross-Encoder Reranking]
    end

    subgraph Vision["Vision (optional)"]
        Ollama[Ollama + Gemma 3]
    end

    subgraph Monitoring["Monitoring (optional)"]
        Prom[Prometheus]
        Graf[Grafana]
        Lang[Langfuse V3]
    end

    UI --> Caddy --> API
    API --> RAG --> QD
    API --> Agent --> RAG
    API --> Guard
    API --> Ingest --> PG
    Ingest --> FS
    Ingest --> ML
    Ingest --> Ollama
    RAG --> ML
    API --> Prom --> Graf
    API --> Lang
Loading

Pipeline: Upload document → detect type (digital/scanned) → parse/OCR → chunk with position metadata → embed → store in Qdrant → query → retrieve → generate answer with citations → verify & validate → stream response with highlights

Quick Start

Prerequisites

  • Python 3.12+, Node.js 22+, Docker
  • An LLM API key (Gemini, Anthropic, or OpenAI) — or Ollama for local models

1. Clone and configure

git clone <repo-url> && cd docvault
cp .env.dev .env
# Edit .env with your API key (see Configuration section below)

2. Install dependencies (first time only)

make install            # Backend (Python via uv)
make frontend-install   # Frontend (Node via pnpm)

3. Start

make dev-up             # Qdrant + PostgreSQL + backend + frontend

Open http://localhost:5173 — upload a document and start asking questions.

Deployment Modes

Mode Command What it runs
Dev (lightweight) make dev-up Qdrant + PostgreSQL + native backend/frontend
Dev (full-featured) make dev-full Above + ML service + Ollama vision
Production make prod Full Docker stack with TLS, monitoring, ML, vision
Air-gapped cp .env.airgapped .env && make prod Zero external calls — all inference local
# Stop commands
make dev-down           # Stop lightweight dev
make dev-full-down      # Stop full-featured dev
make prod-down          # Stop production
make status             # Show all running services

Configuration

All configuration is via environment variables. Copy .env.dev (development) or .env.production (production) to .env.

LLM Provider

# Gemini (default)
DOCVAULT_LLM_MODEL=gemini/gemini-2.0-flash
GEMINI_API_KEY=your-key

# Claude
DOCVAULT_LLM_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=your-key

# OpenAI
DOCVAULT_LLM_MODEL=gpt-4o
OPENAI_API_KEY=your-key

# Local via Ollama (no API key needed)
DOCVAULT_LLM_MODEL=ollama/llama3.1:8b
DOCVAULT_LLM_BASE_URL=http://localhost:11434

Environment Variables

Variable Default Description
LLM
DOCVAULT_LLM_MODEL gemini/gemini-2.0-flash LiteLLM model identifier
DOCVAULT_LLM_BASE_URL Base URL for local models (e.g., Ollama)
Embeddings
DOCVAULT_EMBEDDING_PROVIDER api api (LiteLLM) or remote (ML service)
DOCVAULT_EMBEDDING_MODEL nomic-ai/nomic-embed-text-v1.5 Embedding model name
DOCVAULT_ML_SERVICE_URL ML service URL (required when provider=remote)
DOCVAULT_ML_SHARED_VOLUME false Use shared Docker volume for file transfer
DOCVAULT_EMBEDDING_BATCH_SIZE 32 Batch size for embedding requests
DOCVAULT_EMBEDDING_CONCURRENCY 3 Max concurrent embedding requests
Search
DOCVAULT_SEARCH_MODE hybrid semantic, bm25, or hybrid
DOCVAULT_HYBRID_SEMANTIC_WEIGHT 0.7 Weight for semantic vs BM25 in hybrid mode
DOCVAULT_CONFIDENCE_THRESHOLD 0.3 Minimum confidence for search results
OCR & Vision
DOCVAULT_OCR_BACKEND tesseract tesseract or docling (ML service)
DOCVAULT_OCR_CONCURRENCY 2 Max concurrent OCR operations
DOCVAULT_VISION_URL Ollama URL for vision (e.g., http://localhost:11434)
DOCVAULT_VISION_MODEL gemma3:4b Vision model for figure description
Cache
DOCVAULT_CACHE_ENABLED true Enable semantic query cache
DOCVAULT_CACHE_THRESHOLD 0.92 Cosine similarity threshold for cache hits
DOCVAULT_CACHE_TTL_HOURS 24 Cache entry time-to-live
Security
DOCVAULT_JWT_SECRET dev-secret-... JWT signing key — change in production
DOCVAULT_JWT_EXPIRY_SECONDS 3600 JWT token lifetime
DOCVAULT_MAX_UPLOAD_SIZE_MB 50 Maximum upload file size
DOCVAULT_CORS_ORIGINS * Allowed CORS origins
Infrastructure
DOCVAULT_DATABASE_URL postgresql://...localhost:5432/docvault PostgreSQL connection URL
QDRANT_URL http://localhost:6333 Qdrant vector database URL
DOCVAULT_UPLOAD_DIR ./data/uploads File storage directory
DOCVAULT_LOG_LEVEL INFO Logging level
DOCVAULT_DEBUG true Enable Swagger docs (disable in production)
Monitoring
LANGFUSE_HOST Langfuse URL (enables LLM tracing)
DOCVAULT_COST_ALERT_DAILY_USD 5.00 Daily LLM cost alert threshold
Concurrency
DOCVAULT_INGESTION_WORKERS 2 Parallel ingestion workers
DOCVAULT_FIGURE_CONCURRENCY 3 Max concurrent figure processing
DOCVAULT_RATE_LIMIT_QUERY 30/minute Query endpoint rate limit
DOCVAULT_RATE_LIMIT_UPLOAD 10/minute Upload endpoint rate limit
DOCVAULT_RATE_LIMIT_AUTH 5/minute Auth endpoint rate limit

Tech Stack

Component Technology
Backend Python 3.12, FastAPI, async/await
LLM Gateway LiteLLM (Gemini, Claude, GPT, Ollama, etc.)
Embeddings LiteLLM API or local via ML service
Vector DB Qdrant
Database PostgreSQL 16 (asyncpg)
Document Parsing PyMuPDF, Docling (DOCX/PPTX/images), OCR
Frontend React 19, TypeScript (strict), Vite, TailwindCSS
Monitoring Prometheus, Grafana, Langfuse
Vision Ollama + Gemma 3 4B
ML Service docTR OCR, cross-encoder reranking, local embeddings
Reverse Proxy Caddy (SPA routing, API proxy, TLS)
Testing pytest + pytest-asyncio, Vitest + RTL
Linting ruff + mypy strict, ESLint + Prettier
Package Managers uv (backend), pnpm (frontend)

API Overview

All endpoints are served under /api. See docs/api-reference.md for the full reference.

Endpoint Method Description
/api/health GET Application status and configuration
/api/documents/upload POST Upload a document (PDF, DOCX, PPTX, image)
/api/documents GET List documents with pagination
/api/documents/{id} DELETE Delete a document and its chunks
/api/query POST Ask a question (basic or agentic mode)
/api/query/stream POST Streaming query via SSE
/api/sessions GET/POST Chat session management
/api/sessions/{id} GET/PATCH/DELETE Session CRUD + title update
/api/sessions/{id}/generate-title POST LLM-generated session title
/api/sessions/{id}/export GET Export session as Markdown or PDF
/api/feedback POST Submit thumbs up/down feedback
/api/graph GET Knowledge graph data
/api/auth/login POST JWT authentication
/api/observability/traces GET LLM trace listing
/api/observability/metrics GET Aggregated metrics

Development

Project Structure

docvault/
├── backend/src/docvault/   — FastAPI app
│   ├── api/                  Routes and middleware
│   ├── core/                 Config, LLM client, prompts, database, migrations
│   ├── ingestion/            Parsing, OCR, chunking, embedding, vector store
│   ├── rag/                  Retrieval, generation, citation extraction
│   ├── agent/                Agentic multi-step retrieval
│   ├── guardrails/           Sanitization, validation, injection/hallucination detection
│   ├── chat/                 Session and message storage
│   ├── auth/                 JWT + API key auth, RBAC
│   ├── feedback/             User feedback storage
│   └── prompts/              All LLM prompts as .md files
├── backend/tests/            pytest test suite
├── frontend/src/             React SPA
│   ├── components/           UI components
│   ├── hooks/                Custom React hooks
│   ├── services/             Typed API client
│   └── types/                TypeScript interfaces
├── ml-service/               Optional ML service (embeddings, OCR, reranking)
├── docker/                   Dockerfiles and Caddyfiles
├── monitoring/               Grafana dashboards and Prometheus config
├── docs/                     Guides, API reference, ADRs
├── scripts/                  Backup, migration, utility scripts
├── .ralph/                   Phase specs and fix plan
└── Makefile                  All commands

Commands

# Development
make dev-up               # Start infra + backend + frontend
make dev-down             # Stop everything
make dev-full             # Full-featured: + ML service + vision
make dev-full-down        # Stop full-featured dev

# Production
make prod                 # Full Docker stack with TLS + monitoring
make prod-down            # Stop production

# Status
make status               # Show all running services

# Backend
make install              # Install Python deps (uv)
make test                 # Run pytest (testcontainers PostgreSQL)
make lint                 # Run ruff + mypy strict

# Frontend
make frontend-install     # Install Node deps (pnpm)
make frontend-test        # Run Vitest
make frontend-lint        # Run ESLint + Prettier

# Evaluation
make eval                 # Run RAG evaluation pipeline
make eval-compare         # Compare eval configurations

# Add-ons
make monitoring-up        # Prometheus + Grafana + Langfuse
make vision-up            # Ollama + Gemma 3 (GPU required)

# Utilities
make backup               # Backup PostgreSQL + Qdrant + files
make restore BACKUP=path  # Restore from backup
make down-all             # Stop all containers
make clean                # Remove caches and build artifacts

Testing

Backend tests use pytest with async support against a real PostgreSQL via testcontainers:

make test                 # Backend
make frontend-test        # Frontend (Vitest + React Testing Library)

Code Quality

make lint                 # ruff check + ruff format --check + mypy --strict
make frontend-lint        # eslint + prettier --check

Monitoring

make monitoring-up        # Start Prometheus + Grafana + Langfuse
  • Prometheushttp://localhost:9090
  • Grafanahttp://localhost:3001 (admin / docvault)

Pre-built dashboards: Overview (latency, cost, throughput), LLM Usage & Cost, RAG Quality & Guardrails.

Documentation

Detailed guides in docs/:

Security

See SECURITY.md for vulnerability reporting, credential management, and production hardening checklist.

Contributing

See CONTRIBUTING.md for development setup, coding conventions, and PR guidelines.

License

MIT License

About

Intelligent Document RAG with citation extraction. Upload PDFs, DOCX, PPTX or images, ask questions, get answers with precise citations linked to source pages. Features agentic multi-step retrieval, knowledge graph visualization, and full offline mode via Ollama.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors