TechCorp PolicyCopilot - Complete RAG Pipeline

A complete Retrieval-Augmented Generation (RAG) system that demonstrates how to build an end-to-end pipeline for document-based question answering. This project implements a RAG system that can answer questions about company policies by retrieving relevant information from a knowledge base and generating contextual responses.

🎯 Overview

This project showcases a production-ready RAG pipeline that:

Loads and chunks documents for efficient retrieval
Generates embeddings using sentence transformers
Stores documents in a persistent vector database (ChromaDB)
Performs semantic search to find relevant context
Augments prompts with retrieved context
Generates accurate, context-aware responses

🏗️ Architecture

The RAG pipeline consists of six main components:

┌─────────────────┐
│  User Query     │
└────────┬────────┘
         │
         ▼
┌─────────────────┐     ┌──────────────────┐
│  Query Encoding │────▶│  Vector Search   │
└─────────────────┘     └────────┬─────────┘
                                 │
                                 ▼
┌─────────────────┐     ┌──────────────────┐
│  Context        │◀────│  Document Chunks │
│  Augmentation   │     └──────────────────┘
└────────┬────────┘
         │
         ▼
┌─────────────────┐
│  Response       │
│  Generation     │
└─────────────────┘

Components

Document Loader (src/document/loader.py)
- Loads documents from various sources
- Splits documents into chunks using recursive character text splitting
- Configurable chunk size and overlap
Embedding Model (src/embeddings/model.py)
- Uses SentenceTransformer models for generating embeddings
- Default model: all-MiniLM-L6-v2 (384 dimensions)
- Encodes both documents and queries into vector space
Vector Store (src/vector/store.py)
- Persistent ChromaDB storage
- Cosine similarity for retrieval
- Automatic deduplication and indexing
Query Processor (src/query/processor.py)
- Encodes user queries into embeddings
- Performs similarity search in vector database
- Returns top-k most relevant document chunks
Response Generator (src/response/generator.py)
- Augments prompts with retrieved context
- Generates context-aware responses
- Currently uses a simulated LLM (ready for OpenAI/Anthropic integration)
Main Pipeline (src/main.py)
- Orchestrates all components
- Handles initialization and query processing
- Manages persistent state

📋 Prerequisites

Python 3.8 or higher
pip package manager

🚀 Installation

Clone the repository (if applicable):

git clone <repository-url>
cd complete-rag

Create a virtual environment (recommended):

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Download spaCy language model (required for text processing):
```
python -m spacy download en_core_web_sm
```

📖 Usage

Running the Demo

The easiest way to get started is to run the demo script:

python run_demo.py

This will:

Initialize the RAG system (loads documents, creates embeddings, sets up vector DB)
Process several example queries
Display the complete pipeline workflow for each query

Programmatic Usage

You can also use the RAG system programmatically:

from src.main import initialize_rag_system, process_query

# Initialize the system (one-time setup)
collection, model = initialize_rag_system()

# Process queries
response = process_query(
    "What's the reimbursement policy for home office equipment?",
    collection,
    model
)
print(response)

Custom Queries

To process your own queries, modify the test_queries list in run_demo.py or call process_query() directly with your question.

📁 Project Structure

complete-rag/
├── README.md                 # This file
├── requirements.txt          # Python dependencies
├── run_demo.py              # Demo entry point
├── chroma_db/               # Persistent vector database (auto-generated)
│   └── ...
├── src/
│   ├── __init__.py
│   ├── main.py              # Main pipeline orchestration
│   ├── config.py            # Configuration constants
│   ├── document/
│   │   ├── __init__.py
│   │   └── loader.py        # Document loading and chunking
│   ├── embeddings/
│   │   ├── __init__.py
│   │   └── model.py         # Embedding model management
│   ├── vector/
│   │   ├── __init__.py
│   │   └── store.py         # Vector database operations
│   ├── query/
│   │   ├── __init__.py
│   │   └── processor.py     # Query processing and search
│   └── response/
│       ├── __init__.py
│       └── generator.py     # Response generation
└── techcorp-docs/           # Sample documents (optional)
    ├── customer-faqs/
    ├── employee-handbook/
    ├── meeting-notes/
    └── product-specs/

⚙️ Configuration

All configuration is centralized in src/config.py:

# Vector Database
DEFAULT_PERSIST_DIRECTORY = "./chroma_db"
COLLECTION_NAME = "techcorp_policies"
SIMILARITY_METRIC = "cosine"

# Embedding Model
EMBEDDING_MODEL_NAME = "all-MiniLM-L6-v2"

# Chunking
CHUNK_SIZE = 200
CHUNK_OVERLAP = 50
CHUNK_SEPARATORS = ["\n\n", "\n", " ", ""]

# Search
DEFAULT_TOP_K = 3

Customization

Chunk Size: Adjust CHUNK_SIZE to control document chunk granularity
Top-K Results: Change DEFAULT_TOP_K to retrieve more/fewer context chunks
Embedding Model: Switch to a different SentenceTransformer model for better accuracy
Similarity Metric: Use "l2" or "ip" instead of "cosine" if needed

🔄 How It Works

Initialization Phase (One-Time)

Document Loading: Loads sample policy documents
Chunking: Splits documents into smaller, manageable chunks (200 chars with 50 char overlap)
Embedding Generation: Converts each chunk into a vector embedding
Vector Storage: Stores embeddings in ChromaDB with metadata
Model Loading: Loads the SentenceTransformer model for query encoding

Query Processing Phase (Per Query)

Query Encoding: Converts the user query into an embedding vector
Vector Search: Finds top-k most similar document chunks using cosine similarity
Context Augmentation: Assembles retrieved chunks into context for the LLM
Response Generation: Generates a response based on the augmented prompt

Persistence

The vector database is persisted to disk (./chroma_db/), so:

Initial setup only runs once
Subsequent runs reuse existing embeddings
No need to re-index documents unless they change

🧪 Example Queries

The demo includes these example queries:

"What's the reimbursement policy for home office equipment?"
"Can I get money back for buying a desk?"
"How much can I claim for my home office?"
"What's the travel expense policy?"
"How many vacation days do I get?"

🔧 Dependencies

Key dependencies include:

chromadb: Vector database for storing and searching embeddings
sentence-transformers: Embedding model library
langchain: Text splitting utilities
scikit-learn: Machine learning utilities
spacy: Natural language processing

See requirements.txt for the complete list with versions.

🚧 Future Enhancements

Potential improvements for production use:

Real LLM Integration: Replace simulated response with OpenAI/Anthropic API calls
Document Loaders: Support for PDF, Word, and other document formats
Advanced Chunking: Implement semantic chunking strategies
[ ] Hybrid Search: Combine vector search with keyword/BM25 search
Query Expansion: Improve query understanding with query rewriting
Response Citations: Add source citations to generated responses
Web Interface: Build a web UI for easier interaction
Batch Processing: Support for processing multiple queries efficiently
Evaluation Metrics: Add RAG evaluation metrics (retrieval accuracy, response quality)

📝 Notes

The current implementation uses a simulated LLM for response generation. To use a real LLM, modify src/response/generator.py to call your preferred API (OpenAI, Anthropic, etc.).
The vector database persists between runs, so you only need to re-index when documents change.
Sample documents are currently hardcoded in src/document/loader.py. You can extend this to load from files or external sources.

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

📄 License

[Specify your license here]

🙏 Acknowledgments

Built with ChromaDB for vector storage
Uses Sentence Transformers for embeddings
Inspired by modern RAG architectures and best practices

TechCorp PolicyCopilot - Making policy information accessible through AI-powered search and retrieval.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TechCorp PolicyCopilot - Complete RAG Pipeline

🎯 Overview

🏗️ Architecture

Components

📋 Prerequisites

🚀 Installation

📖 Usage

Running the Demo

Programmatic Usage

Custom Queries

📁 Project Structure

⚙️ Configuration

Customization

🔄 How It Works

Initialization Phase (One-Time)

Query Processing Phase (Per Query)

Persistence

🧪 Example Queries

🔧 Dependencies

🚧 Future Enhancements

📝 Notes

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
techcorp-docs		techcorp-docs
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt
run_demo.py		run_demo.py

Folders and files

Latest commit

History

Repository files navigation

TechCorp PolicyCopilot - Complete RAG Pipeline

🎯 Overview

🏗️ Architecture

Components

📋 Prerequisites

🚀 Installation

📖 Usage

Running the Demo

Programmatic Usage

Custom Queries

📁 Project Structure

⚙️ Configuration

Customization

🔄 How It Works

Initialization Phase (One-Time)

Query Processing Phase (Per Query)

Persistence

🧪 Example Queries

🔧 Dependencies

🚧 Future Enhancements

📝 Notes

🤝 Contributing

📄 License

🙏 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages