A production‑style, local‑first RAG MVP with SQL metadata + vector store, page‑level citations, observability, and a small evaluation harness.
- Start Postgres
docker compose up -d- Create tables (once)
psql postgresql://rocp:rocp_pw@localhost:5432/rocp_db -f db/schema.sql- Create a Python env + install deps
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txtOptional: set OpenAI key for embeddings + generation
export OPENAI_API_KEY=YOUR_KEYpython -m ingestion.ingest --paths SampleData/pdfs SampleData/docspython -m rag.query --query "What is this document about?"python -m evaluation.run_eval --eval_file evaluation/examples/eval_demo.jsonl- SQL for metadata & traceability:
documents,ingestion_runs, andchunksprovide a durable audit trail and idempotency. - Vector DB for retrieval: Chroma persists embeddings locally and stores chunk‑level metadata for filtering and citations.
- Idempotency: documents are de‑duplicated by
uri+file_hash/text_hash; re‑ingest skips unchanged files. - Grounded answers: output includes citations with doc title, uri, page number, and chunk id.
- Confidence threshold: if top score < threshold, the system refuses with “I don’t know.”