DocVault

Intelligent Document RAG with exact citation extraction.

Upload PDFs, DOCX, PPTX, or images → ask questions → get answers with precise citations linked to source pages.

Screenshots

Chat with exact citations linked to source pages — click a citation to highlight the passage in the PDF viewer

Citation bounding boxes rendered directly on the PDF page for precise source verification

Interactive knowledge graph — UMAP projection of document chunks with clustering and similarity-based edges

Admin panel — user management with role-based access control (viewer, editor, admin)

Project Metrics


Backend	14,300+ lines of Python across 50+ modules
Frontend	7,200+ lines of TypeScript / React
Tests	70 test files (55 backend + 15 frontend) with 15,100+ lines
LLM Prompts	15 prompt templates (zero hardcoded in code)
Documentation	2,400+ lines across guides, ADRs, and API reference
Infrastructure	9 Docker configs, 4 deployment modes, 3 Grafana dashboards

Development Process

This project was built using AI-assisted development — a spec-driven workflow where a human architect defines the system design and AI agents implement it under review.

How it works:

Human defines specs — Each of the 50 phases has a detailed specification in .ralph/specs/ covering requirements, architecture decisions, testing criteria, and rollout order
Agents implement — AI coding agents (Ralph for initial phases, Claude Code for refinement and multi-agent workflows) read the spec and implement code, tests, and documentation
Human reviews and iterates — Every phase goes through review for correctness, security, and architectural consistency before being marked complete

Agent orchestration artifacts:

.ralph/ — Phase specs and development roadmap (50 phases, all completed)
.claude/agents/ — Custom specialized subagents (RAG reviewer, security auditor, API consistency checker, test writer)
.claude/rules/ — Domain-specific conventions enforced across agent sessions
.claude/skills/ — Reusable slash commands for deployment, testing, evaluation, and backups

What this demonstrates:

Ability to decompose a complex system into 50 well-scoped, sequential phases — each producing a working, testable artifact
Technical judgment — the human decides architecture (async-first, LiteLLM abstraction, embedding provider protocol, ML service extraction), the agent executes
Multi-agent orchestration — parallel specialized agents (security audit, RAG review, API consistency, test writing) coordinating on the same codebase via isolated worktrees

Features

	Feature	Description
Exact Citations	RAG with source mapping	Answers include verbatim quoted passages with page numbers and bounding box coordinates, rendered as highlights in the PDF viewer
Multi-Modal	Figures + vision	Extracts figures from documents and describes them via Ollama vision models (Gemma 3), indexed as searchable chunks
Agentic Search	Deep multi-step retrieval	Decomposes complex queries into sub-questions, iterates with self-verification for multi-hop reasoning
Knowledge Graph	Visual exploration	Interactive UMAP-based visualization of document chunk relationships with clustering and similarity filtering
Semantic Cache	Smart deduplication	Per-user query caching with cosine similarity matching and automatic invalidation on new uploads
Multi-Format	PDF, DOCX, PPTX, images	Format-specific parsing with OCR for scanned documents and searchable PDF generation
Streaming Chat	Real-time responses	SSE-based streaming with markdown rendering, session management, and export (Markdown/PDF)
Guardrails	Safety + quality	Hallucination detection (NLI grounding), prompt injection defense, confidence scoring, input sanitization
Air-Gapped	Fully offline	Works entirely offline with local models via Ollama — zero external API calls
Observability	Full-stack monitoring	Prometheus + Grafana dashboards, Langfuse LLM tracing, cost tracking, and quality metrics

Architecture

graph TB
    subgraph Frontend
        UI[React / TypeScript / Vite]
    end

    subgraph Edge
        Caddy[Caddy — TLS + Reverse Proxy]
    end

    subgraph Backend
        API[FastAPI]
        RAG[RAG Engine + Citations]
        Agent[Agentic Retrieval]
        Guard[Guardrails Layer]
        Ingest[Ingestion Pipeline]
    end

    subgraph Storage
        PG[(PostgreSQL)]
        QD[(Qdrant)]
        FS[File Storage]
    end

    subgraph ML["ML Service (optional)"]
        Embed[Local Embeddings]
        OCR[docTR Neural OCR]
        Rerank[Cross-Encoder Reranking]
    end

    subgraph Vision["Vision (optional)"]
        Ollama[Ollama + Gemma 3]
    end

    subgraph Monitoring["Monitoring (optional)"]
        Prom[Prometheus]
        Graf[Grafana]
        Lang[Langfuse V3]
    end

    UI --> Caddy --> API
    API --> RAG --> QD
    API --> Agent --> RAG
    API --> Guard
    API --> Ingest --> PG
    Ingest --> FS
    Ingest --> ML
    Ingest --> Ollama
    RAG --> ML
    API --> Prom --> Graf
    API --> Lang

Pipeline: Upload document → detect type (digital/scanned) → parse/OCR → chunk with position metadata → embed → store in Qdrant → query → retrieve → generate answer with citations → verify & validate → stream response with highlights

Quick Start

Prerequisites

Python 3.12+, Node.js 22+, Docker
An LLM API key (Gemini, Anthropic, or OpenAI) — or Ollama for local models

1. Clone and configure

git clone <repo-url> && cd docvault
cp .env.dev .env
# Edit .env with your API key (see Configuration section below)

2. Install dependencies (first time only)

make install            # Backend (Python via uv)
make frontend-install   # Frontend (Node via pnpm)

3. Start

make dev-up             # Qdrant + PostgreSQL + backend + frontend

Open http://localhost:5173 — upload a document and start asking questions.

Deployment Modes

Mode	Command	What it runs
Dev (lightweight)	`make dev-up`	Qdrant + PostgreSQL + native backend/frontend
Dev (full-featured)	`make dev-full`	Above + ML service + Ollama vision
Production	`make prod`	Full Docker stack with TLS, monitoring, ML, vision
Air-gapped	`cp .env.airgapped .env && make prod`	Zero external calls — all inference local

# Stop commands
make dev-down           # Stop lightweight dev
make dev-full-down      # Stop full-featured dev
make prod-down          # Stop production
make status             # Show all running services

Configuration

All configuration is via environment variables. Copy .env.dev (development) or .env.production (production) to .env.

LLM Provider

# Gemini (default)
DOCVAULT_LLM_MODEL=gemini/gemini-2.0-flash
GEMINI_API_KEY=your-key

# Claude
DOCVAULT_LLM_MODEL=claude-sonnet-4-20250514
ANTHROPIC_API_KEY=your-key

# OpenAI
DOCVAULT_LLM_MODEL=gpt-4o
OPENAI_API_KEY=your-key

# Local via Ollama (no API key needed)
DOCVAULT_LLM_MODEL=ollama/llama3.1:8b
DOCVAULT_LLM_BASE_URL=http://localhost:11434

Environment Variables

Variable	Default	Description
LLM
`DOCVAULT_LLM_MODEL`	`gemini/gemini-2.0-flash`	LiteLLM model identifier
`DOCVAULT_LLM_BASE_URL`	—	Base URL for local models (e.g., Ollama)
Embeddings
`DOCVAULT_EMBEDDING_PROVIDER`	`api`	`api` (LiteLLM) or `remote` (ML service)
`DOCVAULT_EMBEDDING_MODEL`	`nomic-ai/nomic-embed-text-v1.5`	Embedding model name
`DOCVAULT_ML_SERVICE_URL`	—	ML service URL (required when provider=`remote`)
`DOCVAULT_ML_SHARED_VOLUME`	`false`	Use shared Docker volume for file transfer
`DOCVAULT_EMBEDDING_BATCH_SIZE`	`32`	Batch size for embedding requests
`DOCVAULT_EMBEDDING_CONCURRENCY`	`3`	Max concurrent embedding requests
Search
`DOCVAULT_SEARCH_MODE`	`hybrid`	`semantic`, `bm25`, or `hybrid`
`DOCVAULT_HYBRID_SEMANTIC_WEIGHT`	`0.7`	Weight for semantic vs BM25 in hybrid mode
`DOCVAULT_CONFIDENCE_THRESHOLD`	`0.3`	Minimum confidence for search results
OCR & Vision
`DOCVAULT_OCR_BACKEND`	`tesseract`	`tesseract` or `docling` (ML service)
`DOCVAULT_OCR_CONCURRENCY`	`2`	Max concurrent OCR operations
`DOCVAULT_VISION_URL`	—	Ollama URL for vision (e.g., `http://localhost:11434`)
`DOCVAULT_VISION_MODEL`	`gemma3:4b`	Vision model for figure description
Cache
`DOCVAULT_CACHE_ENABLED`	`true`	Enable semantic query cache
`DOCVAULT_CACHE_THRESHOLD`	`0.92`	Cosine similarity threshold for cache hits
`DOCVAULT_CACHE_TTL_HOURS`	`24`	Cache entry time-to-live
Security
`DOCVAULT_JWT_SECRET`	`dev-secret-...`	JWT signing key — change in production
`DOCVAULT_JWT_EXPIRY_SECONDS`	`3600`	JWT token lifetime
`DOCVAULT_MAX_UPLOAD_SIZE_MB`	`50`	Maximum upload file size
`DOCVAULT_CORS_ORIGINS`	`*`	Allowed CORS origins
Infrastructure
`DOCVAULT_DATABASE_URL`	`postgresql://...localhost:5432/docvault`	PostgreSQL connection URL
`QDRANT_URL`	`http://localhost:6333`	Qdrant vector database URL
`DOCVAULT_UPLOAD_DIR`	`./data/uploads`	File storage directory
`DOCVAULT_LOG_LEVEL`	`INFO`	Logging level
`DOCVAULT_DEBUG`	`true`	Enable Swagger docs (disable in production)
Monitoring
`LANGFUSE_HOST`	—	Langfuse URL (enables LLM tracing)
`DOCVAULT_COST_ALERT_DAILY_USD`	`5.00`	Daily LLM cost alert threshold
Concurrency
`DOCVAULT_INGESTION_WORKERS`	`2`	Parallel ingestion workers
`DOCVAULT_FIGURE_CONCURRENCY`	`3`	Max concurrent figure processing
`DOCVAULT_RATE_LIMIT_QUERY`	`30/minute`	Query endpoint rate limit
`DOCVAULT_RATE_LIMIT_UPLOAD`	`10/minute`	Upload endpoint rate limit
`DOCVAULT_RATE_LIMIT_AUTH`	`5/minute`	Auth endpoint rate limit

Tech Stack

Component	Technology
Backend	Python 3.12, FastAPI, async/await
LLM Gateway	LiteLLM (Gemini, Claude, GPT, Ollama, etc.)
Embeddings	LiteLLM API or local via ML service
Vector DB	Qdrant
Database	PostgreSQL 16 (asyncpg)
Document Parsing	PyMuPDF, Docling (DOCX/PPTX/images), OCR
Frontend	React 19, TypeScript (strict), Vite, TailwindCSS
Monitoring	Prometheus, Grafana, Langfuse
Vision	Ollama + Gemma 3 4B
ML Service	docTR OCR, cross-encoder reranking, local embeddings
Reverse Proxy	Caddy (SPA routing, API proxy, TLS)
Testing	pytest + pytest-asyncio, Vitest + RTL
Linting	ruff + mypy strict, ESLint + Prettier
Package Managers	uv (backend), pnpm (frontend)

API Overview

All endpoints are served under /api. See docs/api-reference.md for the full reference.

Endpoint	Method	Description
`/api/health`	GET	Application status and configuration
`/api/documents/upload`	POST	Upload a document (PDF, DOCX, PPTX, image)
`/api/documents`	GET	List documents with pagination
`/api/documents/{id}`	DELETE	Delete a document and its chunks
`/api/query`	POST	Ask a question (basic or agentic mode)
`/api/query/stream`	POST	Streaming query via SSE
`/api/sessions`	GET/POST	Chat session management
`/api/sessions/{id}`	GET/PATCH/DELETE	Session CRUD + title update
`/api/sessions/{id}/generate-title`	POST	LLM-generated session title
`/api/sessions/{id}/export`	GET	Export session as Markdown or PDF
`/api/feedback`	POST	Submit thumbs up/down feedback
`/api/graph`	GET	Knowledge graph data
`/api/auth/login`	POST	JWT authentication
`/api/observability/traces`	GET	LLM trace listing
`/api/observability/metrics`	GET	Aggregated metrics

Development

Project Structure

docvault/
├── backend/src/docvault/   — FastAPI app
│   ├── api/                  Routes and middleware
│   ├── core/                 Config, LLM client, prompts, database, migrations
│   ├── ingestion/            Parsing, OCR, chunking, embedding, vector store
│   ├── rag/                  Retrieval, generation, citation extraction
│   ├── agent/                Agentic multi-step retrieval
│   ├── guardrails/           Sanitization, validation, injection/hallucination detection
│   ├── chat/                 Session and message storage
│   ├── auth/                 JWT + API key auth, RBAC
│   ├── feedback/             User feedback storage
│   └── prompts/              All LLM prompts as .md files
├── backend/tests/            pytest test suite
├── frontend/src/             React SPA
│   ├── components/           UI components
│   ├── hooks/                Custom React hooks
│   ├── services/             Typed API client
│   └── types/                TypeScript interfaces
├── ml-service/               Optional ML service (embeddings, OCR, reranking)
├── docker/                   Dockerfiles and Caddyfiles
├── monitoring/               Grafana dashboards and Prometheus config
├── docs/                     Guides, API reference, ADRs
├── scripts/                  Backup, migration, utility scripts
├── .ralph/                   Phase specs and fix plan
└── Makefile                  All commands

Commands

# Development
make dev-up               # Start infra + backend + frontend
make dev-down             # Stop everything
make dev-full             # Full-featured: + ML service + vision
make dev-full-down        # Stop full-featured dev

# Production
make prod                 # Full Docker stack with TLS + monitoring
make prod-down            # Stop production

# Status
make status               # Show all running services

# Backend
make install              # Install Python deps (uv)
make test                 # Run pytest (testcontainers PostgreSQL)
make lint                 # Run ruff + mypy strict

# Frontend
make frontend-install     # Install Node deps (pnpm)
make frontend-test        # Run Vitest
make frontend-lint        # Run ESLint + Prettier

# Evaluation
make eval                 # Run RAG evaluation pipeline
make eval-compare         # Compare eval configurations

# Add-ons
make monitoring-up        # Prometheus + Grafana + Langfuse
make vision-up            # Ollama + Gemma 3 (GPU required)

# Utilities
make backup               # Backup PostgreSQL + Qdrant + files
make restore BACKUP=path  # Restore from backup
make down-all             # Stop all containers
make clean                # Remove caches and build artifacts

Testing

Backend tests use pytest with async support against a real PostgreSQL via testcontainers:

make test                 # Backend
make frontend-test        # Frontend (Vitest + React Testing Library)

Code Quality

make lint                 # ruff check + ruff format --check + mypy --strict
make frontend-lint        # eslint + prettier --check

Monitoring

make monitoring-up        # Start Prometheus + Grafana + Langfuse

Prometheus — http://localhost:9090
Grafana — http://localhost:3001 (admin / docvault)

Pre-built dashboards: Overview (latency, cost, throughput), LLM Usage & Cost, RAG Quality & Guardrails.

Documentation

Detailed guides in docs/:

Architecture Overview — System design and component interactions
ML Service — GPU service for OCR, embeddings, reranking
Quick Start — Clone-to-running guide
Configuration — Complete env var reference
API Reference — All endpoints with examples
Troubleshooting — Common issues and fixes
Performance Tuning — Optimization guide
Feature Guides: Agentic Mode · Knowledge Graph · Semantic Cache · Multi-Modal · Feedback · Sharing
ADRs: LiteLLM · Qdrant · PostgreSQL · Prompts as Files · Caddy · Async-First · ML Service · Embedding Providers

Security

See SECURITY.md for vulnerability reporting, credential management, and production hardening checklist.

Contributing

See CONTRIBUTING.md for development setup, coding conventions, and PR guidelines.

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 49 Commits
.claude		.claude
.github/workflows		.github/workflows
.ralph		.ralph
backend		backend
docker		docker
docs		docs
frontend		frontend
ml-service		ml-service
monitoring		monitoring
scripts		scripts
.dockerignore		.dockerignore
.env.airgapped		.env.airgapped
.env.dev		.env.dev
.env.example		.env.example
.env.production		.env.production
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.ralphrc		.ralphrc
CLAUDE.md		CLAUDE.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.gpu.yml		docker-compose.gpu.yml
docker-compose.yml		docker-compose.yml
prometheus.yml		prometheus.yml
start.sh		start.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DocVault

Screenshots

Project Metrics

Development Process

Features

Architecture

Quick Start

Prerequisites

1. Clone and configure

2. Install dependencies (first time only)

3. Start

Deployment Modes

Configuration

LLM Provider

Environment Variables

Tech Stack

API Overview

Development

Project Structure

Commands

Testing

Code Quality

Monitoring

Documentation

Security

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DocVault

Screenshots

Project Metrics

Development Process

Features

Architecture

Quick Start

Prerequisites

1. Clone and configure

2. Install dependencies (first time only)

3. Start

Deployment Modes

Configuration

LLM Provider

Environment Variables

Tech Stack

API Overview

Development

Project Structure

Commands

Testing

Code Quality

Monitoring

Documentation

Security

Contributing

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages