A complete Retrieval-Augmented Generation (RAG) system that demonstrates how to build an end-to-end pipeline for document-based question answering. This project implements a RAG system that can answer questions about company policies by retrieving relevant information from a knowledge base and generating contextual responses.
This project showcases a production-ready RAG pipeline that:
- Loads and chunks documents for efficient retrieval
- Generates embeddings using sentence transformers
- Stores documents in a persistent vector database (ChromaDB)
- Performs semantic search to find relevant context
- Augments prompts with retrieved context
- Generates accurate, context-aware responses
The RAG pipeline consists of six main components:
βββββββββββββββββββ
β User Query β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Query Encoding ββββββΆβ Vector Search β
βββββββββββββββββββ ββββββββββ¬ββββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Context βββββββ Document Chunks β
β Augmentation β ββββββββββββββββββββ
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Response β
β Generation β
βββββββββββββββββββ
-
Document Loader (
src/document/loader.py)- Loads documents from various sources
- Splits documents into chunks using recursive character text splitting
- Configurable chunk size and overlap
-
Embedding Model (
src/embeddings/model.py)- Uses SentenceTransformer models for generating embeddings
- Default model:
all-MiniLM-L6-v2(384 dimensions) - Encodes both documents and queries into vector space
-
Vector Store (
src/vector/store.py)- Persistent ChromaDB storage
- Cosine similarity for retrieval
- Automatic deduplication and indexing
-
Query Processor (
src/query/processor.py)- Encodes user queries into embeddings
- Performs similarity search in vector database
- Returns top-k most relevant document chunks
-
Response Generator (
src/response/generator.py)- Augments prompts with retrieved context
- Generates context-aware responses
- Currently uses a simulated LLM (ready for OpenAI/Anthropic integration)
-
Main Pipeline (
src/main.py)- Orchestrates all components
- Handles initialization and query processing
- Manages persistent state
- Python 3.8 or higher
- pip package manager
-
Clone the repository (if applicable):
git clone <repository-url> cd complete-rag
-
Create a virtual environment (recommended):
python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Download spaCy language model (required for text processing):
python -m spacy download en_core_web_sm
The easiest way to get started is to run the demo script:
python run_demo.pyThis will:
- Initialize the RAG system (loads documents, creates embeddings, sets up vector DB)
- Process several example queries
- Display the complete pipeline workflow for each query
You can also use the RAG system programmatically:
from src.main import initialize_rag_system, process_query
# Initialize the system (one-time setup)
collection, model = initialize_rag_system()
# Process queries
response = process_query(
"What's the reimbursement policy for home office equipment?",
collection,
model
)
print(response)To process your own queries, modify the test_queries list in run_demo.py or call process_query() directly with your question.
complete-rag/
βββ README.md # This file
βββ requirements.txt # Python dependencies
βββ run_demo.py # Demo entry point
βββ chroma_db/ # Persistent vector database (auto-generated)
β βββ ...
βββ src/
β βββ __init__.py
β βββ main.py # Main pipeline orchestration
β βββ config.py # Configuration constants
β βββ document/
β β βββ __init__.py
β β βββ loader.py # Document loading and chunking
β βββ embeddings/
β β βββ __init__.py
β β βββ model.py # Embedding model management
β βββ vector/
β β βββ __init__.py
β β βββ store.py # Vector database operations
β βββ query/
β β βββ __init__.py
β β βββ processor.py # Query processing and search
β βββ response/
β βββ __init__.py
β βββ generator.py # Response generation
βββ techcorp-docs/ # Sample documents (optional)
βββ customer-faqs/
βββ employee-handbook/
βββ meeting-notes/
βββ product-specs/
All configuration is centralized in src/config.py:
# Vector Database
DEFAULT_PERSIST_DIRECTORY = "./chroma_db"
COLLECTION_NAME = "techcorp_policies"
SIMILARITY_METRIC = "cosine"
# Embedding Model
EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"
# Chunking
CHUNK_SIZE = 200
CHUNK_OVERLAP = 50
CHUNK_SEPARATORS = ["\n\n", "\n", " ", ""]
# Search
DEFAULT_TOP_K = 3- Chunk Size: Adjust
CHUNK_SIZEto control document chunk granularity - Top-K Results: Change
DEFAULT_TOP_Kto retrieve more/fewer context chunks - Embedding Model: Switch to a different SentenceTransformer model for better accuracy
- Similarity Metric: Use "l2" or "ip" instead of "cosine" if needed
- Document Loading: Loads sample policy documents
- Chunking: Splits documents into smaller, manageable chunks (200 chars with 50 char overlap)
- Embedding Generation: Converts each chunk into a vector embedding
- Vector Storage: Stores embeddings in ChromaDB with metadata
- Model Loading: Loads the SentenceTransformer model for query encoding
- Query Encoding: Converts the user query into an embedding vector
- Vector Search: Finds top-k most similar document chunks using cosine similarity
- Context Augmentation: Assembles retrieved chunks into context for the LLM
- Response Generation: Generates a response based on the augmented prompt
The vector database is persisted to disk (./chroma_db/), so:
- Initial setup only runs once
- Subsequent runs reuse existing embeddings
- No need to re-index documents unless they change
The demo includes these example queries:
- "What's the reimbursement policy for home office equipment?"
- "Can I get money back for buying a desk?"
- "How much can I claim for my home office?"
- "What's the travel expense policy?"
- "How many vacation days do I get?"
Key dependencies include:
- chromadb: Vector database for storing and searching embeddings
- sentence-transformers: Embedding model library
- langchain: Text splitting utilities
- scikit-learn: Machine learning utilities
- spacy: Natural language processing
See requirements.txt for the complete list with versions.
Potential improvements for production use:
- Real LLM Integration: Replace simulated response with OpenAI/Anthropic API calls
- Document Loaders: Support for PDF, Word, and other document formats
- Advanced Chunking: Implement semantic chunking strategies
- [ ] Hybrid Search: Combine vector search with keyword/BM25 search
- Query Expansion: Improve query understanding with query rewriting
- Response Citations: Add source citations to generated responses
- Web Interface: Build a web UI for easier interaction
- Batch Processing: Support for processing multiple queries efficiently
- Evaluation Metrics: Add RAG evaluation metrics (retrieval accuracy, response quality)
- The current implementation uses a simulated LLM for response generation. To use a real LLM, modify
src/response/generator.pyto call your preferred API (OpenAI, Anthropic, etc.). - The vector database persists between runs, so you only need to re-index when documents change.
- Sample documents are currently hardcoded in
src/document/loader.py. You can extend this to load from files or external sources.
Contributions are welcome! Please feel free to submit a Pull Request.
[Specify your license here]
- Built with ChromaDB for vector storage
- Uses Sentence Transformers for embeddings
- Inspired by modern RAG architectures and best practices
TechCorp PolicyCopilot - Making policy information accessible through AI-powered search and retrieval.