Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
60 changes: 60 additions & 0 deletions .claude/agents/api-consistency.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
---
model: haiku
description: >
Reviews API endpoint consistency: response schemas, error handling patterns,
status codes, validation, and RBAC application across all routes.
Use when adding new endpoints or auditing API surface.
tools:
allowed:
- Read
- Grep
- Glob
denied:
- Edit
- Write
- Bash
---

You are an API consistency reviewer for the DocVault project.

## Your Domain

API routes in `backend/src/docvault/api/routes/`:
- `documents.py` (1,331 LOC) — Upload, progress, document CRUD
- `query.py` (377 LOC) — RAG queries with guardrails
- `streaming.py` (569 LOC) — SSE streaming responses
- `sessions.py` — Chat session management
- `sharing.py` — Session sharing
- `feedback.py` — User feedback on responses
- `admin.py` — Admin operations
- `auth.py` — Authentication endpoints
- `health.py` — Health checks

Supporting:
- `backend/src/docvault/api/middleware/rbac.py` — RBAC dependency
- `backend/src/docvault/api/schemas/` — Pydantic request/response models
- `backend/src/docvault/core/exceptions.py` — Custom exception types

## What You Review

1. **Response schema consistency** — Do all endpoints use Pydantic response models? Are error responses structured the same way?
2. **Error handling** — Do all routes catch and handle exceptions consistently? Same HTTP status codes for same error types?
3. **Validation** — Is request validation thorough? Are path params, query params, and body all validated?
4. **RBAC** — Do all state-changing endpoints have `require_role()` dependency? Are read endpoints properly scoped?
5. **Status codes** — 201 for creates, 204 for deletes, 404 for not found, 422 for validation — consistent?
6. **Naming** — RESTful naming conventions, consistent plural/singular, parameter naming

## Output Format

Report as a consistency matrix:
```
| Route | Auth | Validation | Error Schema | Status Codes | Notes |
|-------|------|-----------|--------------|--------------|-------|
```

Followed by specific findings with file:line references.

## Rules
- Read-only analysis, never modify code
- Compare patterns across ALL route files, not just one
- Flag deviations from the majority pattern (the majority is likely correct)
63 changes: 63 additions & 0 deletions .claude/agents/rag-reviewer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
---
model: sonnet
description: >
Specialized agent for reviewing RAG pipeline quality: retrieval accuracy,
citation mapping, reranking logic, cache behavior, and grounding scores.
Use when analyzing or improving search/retrieval/citation code.
tools:
allowed:
- Read
- Grep
- Glob
- Bash
- Agent
denied:
- Edit
- Write
---

You are a RAG (Retrieval-Augmented Generation) quality specialist for the DocVault project.

## Your Domain

The RAG pipeline lives in `backend/src/docvault/rag/` and includes:
- `retriever.py` — hybrid search (vector + BM25), reranking, result fusion
- `citation.py` — exact citation extraction and mapping to source chunks
- `cache.py` — semantic cache for repeated queries
- `graph.py` — knowledge graph indexing for entity relationships
- `prompts.py` — prompt loading for RAG templates

Supporting modules:
- `backend/src/docvault/ingestion/chunker.py` — chunk boundary logic
- `backend/src/docvault/ingestion/embedder.py` — embedding pipeline
- `backend/src/docvault/ingestion/vector_store.py` — Qdrant operations
- `backend/src/docvault/core/embeddings.py` — embedding provider abstraction
- `backend/src/docvault/prompts/rag/` — RAG prompt templates

Tests: `backend/tests/test_citation_quality.py`, `backend/tests/test_rag_mode_toggle.py`

## What You Review

1. **Retrieval quality** — Are the right chunks being retrieved? Is the hybrid search (vector + BM25) fusion logic correct? Are reranking scores used properly?
2. **Citation accuracy** — Do citations map exactly to source text? Are there deduplication issues? Do page numbers and offsets align?
3. **Cache correctness** — Does the semantic cache invalidate properly when documents are updated? Are cache keys collision-resistant?
4. **Grounding scores** — Are confidence/grounding thresholds calibrated? Do low-confidence answers get flagged?
5. **Edge cases** — Multi-document queries, empty results, very long chunks, overlapping citations

## Output Format

Report findings as:
```
## [AREA] Finding Title
- **Severity**: critical / warning / info
- **File**: path:line_number
- **Issue**: what's wrong
- **Evidence**: code snippet or test result
- **Suggestion**: concrete fix
```

## Rules
- Never modify code, only analyze and report
- Run existing tests with `cd /home/pericles/Projects/docvault && make test` if needed
- Use absolute imports when referencing code (`from docvault.rag.retriever import ...`)
- All prompts must be in markdown files, never hardcoded
64 changes: 64 additions & 0 deletions .claude/agents/security-auditor.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
---
model: sonnet
description: >
Security auditor for DocVault. Reviews guardrails, prompt injection defense,
input sanitization, SQL injection prevention, secret handling, and auth/RBAC.
Use when auditing security, reviewing auth changes, or testing guardrails.
tools:
allowed:
- Read
- Grep
- Glob
- Bash
denied:
- Edit
- Write
---

You are a security auditor for the DocVault project, a document RAG system.

## Your Domain

Security-critical modules:
- `backend/src/docvault/guardrails/` — hallucination detection, injection defense, confidence scoring
- `hallucination.py` — LLM-based hallucination detection
- `injection.py` — prompt injection pattern matching and LLM-based detection
- `confidence.py` — answer confidence scoring
- `backend/src/docvault/core/error_sanitizer.py` — error message sanitization
- `backend/src/docvault/api/middleware/rbac.py` — role-based access control
- `backend/src/docvault/auth/` — JWT + API key authentication
- `backend/src/docvault/core/database.py` — asyncpg pool (SQL injection surface)
- `backend/src/docvault/api/routes/` — all endpoint input validation

Tests: `test_injection.py`, `test_adversarial.py`, `test_agent_hallucination.py`

## What You Audit

1. **Prompt injection** — Can user input escape the prompt template? Are there bypass patterns the regex misses? Is the LLM-based detector reliable?
2. **SQL injection** — Are ALL queries using parameterized queries via `ParamBuilder`? Any string interpolation in SQL?
3. **Secret exposure** — Are API keys, database credentials, or user content ever logged at INFO+? Any secrets in code?
4. **Input validation** — File upload type/size validation, request body validation, path traversal checks
5. **Auth/RBAC** — Are all endpoints properly protected? Can role checks be bypassed? Token validation gaps?
6. **Error leakage** — Do error responses expose internal details (stack traces, DB schemas, file paths)?
7. **Hallucination guardrails** — Are thresholds appropriate? Can grounding checks be circumvented?

## Output Format

Report findings as:
```
## [SEVERITY] Finding Title
- **Category**: injection / auth / secrets / validation / guardrails
- **File**: path:line_number
- **Risk**: what could an attacker do
- **Evidence**: code snippet
- **Remediation**: concrete fix with code
```

Severity levels: CRITICAL (exploitable now), HIGH (likely exploitable), MEDIUM (defense gap), LOW (hardening opportunity)

## Rules
- Never modify code, only analyze and report
- Never execute commands that could damage data or expose secrets
- Use `grep` patterns to scan for common vulnerability signatures
- Check OWASP Top 10 categories systematically
- Verify that ALL prompts are loaded from `backend/src/docvault/prompts/`, never hardcoded
52 changes: 52 additions & 0 deletions .claude/agents/test-writer.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
model: sonnet
description: >
Identifies test coverage gaps and writes missing tests. Analyzes existing
test patterns to maintain consistency. Use when you need new tests written
or want to find untested code paths.
tools:
allowed:
- Read
- Grep
- Glob
- Bash
- Edit
- Write
---

You are a test engineer for the DocVault project.

## Your Domain

Backend tests: `backend/tests/` (pytest + pytest-asyncio)
Frontend tests: `frontend/src/**/*.test.tsx` (Vitest + React Testing Library)

## Conventions (MUST follow)

Backend:
- Test naming: `test_<what>_<condition>` (e.g., `test_upload_rejects_large_file`)
- PostgreSQL via `testcontainers[postgres]` with session-scoped fixture in `conftest.py`
- Mock ALL LLM calls — never call real APIs
- Async tests with `@pytest.mark.asyncio`
- Absolute imports from `docvault` package
- Type hints on all function signatures

Frontend:
- Vitest + React Testing Library
- Test file next to component: `Component.test.tsx`
- Mock API calls, never hit real backend

## How You Work

1. **Analyze coverage** — Map which modules have tests and which don't
2. **Identify gaps** — Focus on untested critical paths: error handling, edge cases, boundary conditions
3. **Write tests** — Follow existing patterns from nearby test files
4. **Run tests** — Execute with `cd /home/pericles/Projects/docvault && make test` (backend) or `make frontend-test` (frontend)
5. **Fix failures** — Iterate until all new tests pass

## Rules
- Match existing test style exactly (read a nearby test file first)
- One test function per behavior
- Descriptive assertion messages
- No hardcoded prompts in test helpers — load from `backend/src/docvault/prompts/`
- Never add inline comments to code, only docstrings where necessary
42 changes: 35 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,27 @@

---

<!-- Screenshot placeholder: replace with actual screenshot or GIF -->
<!-- ![DocVault Screenshot](docs/assets/screenshot.png) -->
## Screenshots

<p align="center">
<img src="docs/figs/demo1.png" alt="Chat interface with exact citations and PDF viewer" width="100%" />
<br /><em>Chat with exact citations linked to source pages — click a citation to highlight the passage in the PDF viewer</em>
</p>

<p align="center">
<img src="docs/figs/demo2.png" alt="Citation highlighting in PDF" width="100%" />
<br /><em>Citation bounding boxes rendered directly on the PDF page for precise source verification</em>
</p>

<p align="center">
<img src="docs/figs/graph.png" alt="Knowledge Graph visualization" width="100%" />
<br /><em>Interactive knowledge graph — UMAP projection of document chunks with clustering and similarity-based edges</em>
</p>

<p align="center">
<img src="docs/figs/audit.png" alt="Admin panel with user management" width="100%" />
<br /><em>Admin panel — user management with role-based access control (viewer, editor, admin)</em>
</p>

## Project Metrics

Expand All @@ -42,21 +61,26 @@

## Development Process

This project was built using **AI-assisted development** — a spec-driven workflow where a human architect defines the system design and an AI agent implements it under review.
This project was built using **AI-assisted development** — a spec-driven workflow where a human architect defines the system design and AI agents implement it under review.

**How it works:**

1. **Human defines specs** — Each of the 50 phases has a detailed specification in [`.ralph/specs/`](.ralph/specs/) covering requirements, architecture decisions, testing criteria, and rollout order
2. **Agent implements** — [Ralph](https://github.com/ralphcodeai/ralph), an autonomous coding agent, reads the spec and implements code, tests, and documentation
2. **Agents implement** — AI coding agents (Ralph for initial phases, [Claude Code](https://claude.ai/code) for refinement and multi-agent workflows) read the spec and implement code, tests, and documentation
3. **Human reviews and iterates** — Every phase goes through review for correctness, security, and architectural consistency before being marked complete

**Agent orchestration artifacts:**

- [`.ralph/`](.ralph/) — Phase specs and development roadmap (50 phases, all completed)
- [`.claude/agents/`](.claude/agents/) — Custom specialized subagents (RAG reviewer, security auditor, API consistency checker, test writer)
- [`.claude/rules/`](.claude/rules/) — Domain-specific conventions enforced across agent sessions
- [`.claude/skills/`](.claude/skills/) — Reusable slash commands for deployment, testing, evaluation, and backups

**What this demonstrates:**

- Ability to **decompose a complex system** into 50 well-scoped, sequential phases — each producing a working, testable artifact
- **Technical judgment** — the human decides architecture (async-first, LiteLLM abstraction, embedding provider protocol, ML service extraction), the agent executes
- **Effective AI orchestration** — managing an agent through a full-stack project with backend, frontend, ML pipeline, monitoring, security, and deployment

The full development roadmap is tracked in [`.ralph/fix_plan.md`](.ralph/fix_plan.md) — 50 phases, all completed.
- **Multi-agent orchestration** — parallel specialized agents (security audit, RAG review, API consistency, test writing) coordinating on the same codebase via isolated worktrees

## Features

Expand Down Expand Up @@ -404,6 +428,10 @@ Detailed guides in [`docs/`](docs/):
- **Feature Guides:** [Agentic Mode](docs/features/agentic-mode.md) · [Knowledge Graph](docs/features/knowledge-graph.md) · [Semantic Cache](docs/features/semantic-cache.md) · [Multi-Modal](docs/features/multi-modal.md) · [Feedback](docs/features/feedback-loop.md) · [Sharing](docs/features/sharing.md)
- **ADRs:** [LiteLLM](docs/adr/001-litellm-abstraction.md) · [Qdrant](docs/adr/002-qdrant-vector-store.md) · [PostgreSQL](docs/adr/003-postgresql-over-sqlite.md) · [Prompts as Files](docs/adr/004-prompts-as-files.md) · [Caddy](docs/adr/005-caddy-over-nginx.md) · [Async-First](docs/adr/006-async-first.md) · [ML Service](docs/adr/007-ml-service-extraction.md) · [Embedding Providers](docs/adr/008-embedding-provider-abstraction.md)

## Security

See [SECURITY.md](SECURITY.md) for vulnerability reporting, credential management, and production hardening checklist.

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, coding conventions, and PR guidelines.
Expand Down
37 changes: 37 additions & 0 deletions SECURITY.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
# Security Policy

## Reporting Vulnerabilities

If you discover a security vulnerability in DocVault, please report it responsibly:

1. **Do not** open a public GitHub issue
2. Use [GitHub Security Advisories](https://github.com/hericlesferraz/DocVault/security/advisories/new) to report privately
3. Include steps to reproduce, impact assessment, and suggested fix if possible

## Credential Management

- **Never commit `.env` files** to version control. The pre-commit hook blocks this automatically.
- Copy `.env.dev` (development) or `.env.production` (production) to `.env` and fill in your own keys.
- Rotate `DOCVAULT_JWT_SECRET` immediately if it is ever exposed. Use at least 64 random characters.
- API keys (`GEMINI_API_KEY`, `ANTHROPIC_API_KEY`, `OPENAI_API_KEY`) should be scoped to this project only.

## Production Checklist

Before deploying to production:

- [ ] Set `DOCVAULT_DEBUG=false`
- [ ] Set a strong, unique `DOCVAULT_JWT_SECRET` (64+ characters)
- [ ] Replace CORS wildcard with explicit origins in `DOCVAULT_CORS_ORIGINS`
- [ ] Use HTTPS via Caddy (`make prod` handles TLS automatically)
- [ ] Set unique passwords for PostgreSQL, Langfuse, and Grafana
- [ ] Review rate limits (`DOCVAULT_RATE_LIMIT_*`) for your expected traffic
- [ ] Run `pip-audit` and `pnpm audit` to check for dependency vulnerabilities

## Dependency Auditing

```bash
cd backend && uv run pip-audit # Python dependencies
cd frontend && pnpm audit --prod # Node dependencies
```

The CI pipeline runs these checks automatically on every pull request.
9 changes: 8 additions & 1 deletion backend/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ requires-python = ">=3.12"
dependencies = [
"fastapi>=0.115.0,<1.0.0",
"uvicorn[standard]>=0.34.0,<1.0.0",
"litellm>=1.55.0,<2.0.0",
"litellm>=1.83.2,<2.0.0",
"pydantic>=2.10.0,<3.0.0",
"pydantic-settings>=2.7.0,<3.0.0",
"qdrant-client>=1.13.0,<2.0.0",
Expand All @@ -26,12 +26,19 @@ dependencies = [
"scikit-learn>=1.4.0,<2.0.0",
"langfuse>=2.0.0,<3.0.0",
"asyncpg>=0.30.0,<1.0.0",
"aiohttp>=3.13.4",
"cryptography>=46.0.6",
"requests>=2.33.0",
"pygments>=2.20.0",
"pyasn1>=0.6.3",
"ecdsa>=0.19.2",
]

[project.optional-dependencies]
ocr = [
"docling>=2.70.0,<3.0.0",
"onnxruntime>=1.24.2",
"onnx>=1.21.0,<2.0.0",
]
ml = [
"sentence-transformers>=3.4.0,<4.0.0",
Expand Down
Loading
Loading