Skip to content

AdityaGupta2804/Endee-Assignment-

Repository files navigation

RAG Document Assistant

A Retrieval-Augmented Generation (RAG) Document Assistant that enables users to upload documents, index them in a vector database, and query them using natural language to receive AI-generated answers with source citations.


Table of Contents

  1. Problem Statement
  2. Project Overview
  3. System Architecture
  4. Technical Approach
  5. How Endee Vector Database is Used
  6. Setup and Installation
  7. Running the Application
  8. Deployment on Streamlit Cloud
  9. Configuration Reference
  10. Project Structure
  11. Troubleshooting

Problem Statement

Traditional document search relies on keyword matching, which fails to understand the semantic meaning of queries. Users often have large collections of documents (PDFs, text files, markdown) and need to quickly find relevant information without manually reading through each document.

Challenges addressed by this project:

  1. Semantic Understanding: Keyword search cannot understand that "machine learning" and "ML algorithms" are related concepts. Users need a system that understands meaning, not just matches words.

  2. Multi-Document Search: When information is spread across multiple documents, users need a unified search that can find and synthesize relevant passages from all sources.

  3. Explainable Answers: Users need to verify the accuracy of AI-generated answers by seeing the exact source passages used to generate them.

  4. Scalable Vector Storage: Efficiently storing and querying high-dimensional embedding vectors requires a specialized vector database that can handle similarity search at scale.


Project Overview

The RAG Document Assistant solves these challenges by implementing a complete RAG pipeline:

  1. Document Ingestion: Accepts PDF, TXT, and Markdown files. Extracts text content and splits it into semantically meaningful chunks.

  2. Embedding Generation: Converts text chunks into 384-dimensional vectors using the all-MiniLM-L6-v2 sentence transformer model.

  3. Vector Storage: Stores embeddings in Endee Vector Database for efficient similarity search.

  4. Semantic Retrieval: When a user asks a question, the system finds the most semantically similar chunks using cosine similarity.

  5. Answer Generation: Uses Google Gemini API (or local Mistral model) to generate a coherent answer based on the retrieved context.

  6. Source Attribution: Displays the retrieved chunks with relevance scores so users can verify the answer.

Key Features:

  • Multi-format document support (PDF, TXT, Markdown)
  • Semantic search using vector similarity
  • AI-powered answer generation with source citations
  • Configurable LLM backend (Gemini API or local Mistral)
  • Docker-ready deployment
  • Pickle fallback storage for Streamlit Cloud

System Architecture

                                    User Interface (Streamlit)
                                              |
                    +-------------------------+-------------------------+
                    |                         |                         |
              Document Upload            Query Input              Answer Display
                    |                         |                         |
                    v                         v                         ^
            +---------------+         +---------------+         +---------------+
            |   Ingestion   |         |   Retrieval   |         |  Generation   |
            |    Module     |         |    Module     |         |    Module     |
            +---------------+         +---------------+         +---------------+
                    |                         |                         |
                    |    +-----------+        |                         |
                    +--->| Embedding |<-------+                         |
                         |   Module  |                                  |
                         +-----------+                                  |
                               |                                        |
                               v                                        |
                    +---------------------+                             |
                    |   Endee Vector DB   |                             |
                    | (or Pickle Fallback)|                             |
                    +---------------------+                             |
                                                                        |
                                                               +--------+--------+
                                                               |   Gemini API    |
                                                               | (or Local LLM)  |
                                                               +-----------------+

Data Flow:

  1. Ingestion Flow: User uploads document → Text extraction → Chunking → Embedding generation → Vector storage in Endee

  2. Query Flow: User enters question → Query embedding → Vector similarity search → Top-K retrieval → Prompt construction → LLM generation → Answer display with sources


Technical Approach

Document Processing

Documents are processed through a multi-stage pipeline:

  1. Text Extraction:

    • PDF files: Extracted using PyPDF2, preserving page numbers
    • Markdown files: Converted to HTML, then stripped to plain text
    • Text files: Read directly with UTF-8 encoding
  2. Chunking Strategy:

    • Fixed-size chunks of 500 tokens with 50-token overlap
    • Overlap ensures context is not lost at chunk boundaries
    • Each chunk retains metadata (source document, page number, chunk index)

Embedding Model

The system uses the all-MiniLM-L6-v2 sentence transformer model:

  • Dimension: 384-dimensional vectors
  • Speed: Fast inference suitable for real-time applications
  • Quality: Optimized for semantic similarity tasks
  • Memory: ~80MB model size, suitable for CPU inference

Vector Search

Similarity search uses cosine similarity:

similarity = (A · B) / (||A|| × ||B||)

Where A is the query embedding and B is a stored document embedding. Higher similarity scores indicate more relevant content.

RAG Prompt Construction

The prompt sent to the LLM follows this structure:

You are a helpful assistant. Answer the question based ONLY on the provided context.

CONTEXT:
[Source 1: document_name.pdf, Page 3]
<chunk text>

[Source 2: document_name.pdf, Page 7]
<chunk text>

QUESTION: <user question>

INSTRUCTIONS:
- Answer based only on the provided context
- If the answer is not in the context, say so
- Cite your sources

LLM Integration

Two LLM backends are supported:

  1. Google Gemini API (Recommended):

    • Model: gemini-2.0-flash
    • Fast, cost-effective, high-quality responses
    • Requires API key from Google AI Studio
  2. Local Mistral Model (Optional):

    • Model: mistralai/Mistral-7B-Instruct-v0.2
    • Runs locally, no API costs
    • Requires ~14GB RAM

How Endee Vector Database is Used

Endee is a high-performance vector database designed for similarity search. This project uses Endee for:

1. Index Management

An index named rag_documents is created to store document embeddings:

client.create_index(
    name="rag_documents",
    dimension=384,           # Matches embedding model output
    space_type="cosine",     # Cosine similarity metric
    precision=Precision.FLOAT32
)

2. Vector Storage (Upsert)

When documents are indexed, each chunk is stored with its embedding and metadata:

index.upsert([
    {
        "id": "chunk_uuid",
        "vector": [0.1, 0.2, ...],  # 384-dim embedding
        "meta": {
            "text": "Original chunk text...",
            "document_name": "report.pdf",
            "page_number": 5,
            "chunk_index": 12
        }
    }
])

3. Similarity Search (Query)

When a user asks a question, Endee performs efficient similarity search:

results = index.query(
    vector=query_embedding,  # User question embedding
    top_k=4                  # Return 4 most similar chunks
)

Returns results sorted by similarity score with full metadata for source attribution.

4. Fallback Storage

For environments without Endee (like Streamlit Cloud), the system automatically falls back to a pickle-based vector store that implements the same interface using NumPy for cosine similarity calculations.


Setup and Installation

Prerequisites

Step 1: Clone the Repository

git clone https://github.com/yourusername/EndeeProject.git
cd EndeeProject

Step 2: Start Endee Vector Database

Run Endee in Docker:

docker run -d -p 8080:8080 -v endee-data:/data --name endee-server endeeio/endee-server:latest

Verify Endee is running:

curl http://localhost:8080/health

Expected response: {"status":"ok"}

Step 3: Create Python Virtual Environment

Windows:

python -m venv venv
venv\Scripts\activate

Linux/Mac:

python -m venv venv
source venv/bin/activate

Step 4: Install Dependencies

pip install -r requirements.txt

Step 5: Configure Environment Variables

Copy the example environment file:

Windows:

copy .env.example .env

Linux/Mac:

cp .env.example .env

Edit .env with your configuration:

# Endee Configuration
ENDEE_HOST=localhost
ENDEE_PORT=8080

# Storage Backend (set to true for Streamlit Cloud)
USE_PICKLE_STORAGE=false

# LLM Configuration
USE_LOCAL_LLM=false
GEMINI_API_KEY=your_gemini_api_key_here
GEMINI_MODEL=gemini-2.0-flash

# Application Settings
MAX_FILE_SIZE_MB=50
TOP_K_RETRIEVAL=4

Running the Application

Local Development

Start the Streamlit application:

streamlit run app/main.py

The application will be available at: http://localhost:8501

Using Docker Compose

For a complete containerized deployment:

docker-compose up --build

This starts both Endee and the RAG application.

Docker Commands Reference

# Start in background
docker-compose up -d

# View application logs
docker-compose logs -f rag-app

# Stop all services
docker-compose down

# Stop and remove all data
docker-compose down -v

Deployment on Streamlit Cloud

Streamlit Cloud cannot run Docker containers, so we provide a pickle-based fallback storage that works without Endee.

Step 1: Prepare Repository

Ensure your code is pushed to GitHub:

git add .
git commit -m "Prepare for Streamlit Cloud"
git push origin main

Step 2: Configure Secrets

Create .streamlit/secrets.toml in your repository (add to .gitignore):

USE_PICKLE_STORAGE = "true"
GEMINI_API_KEY = "your-gemini-api-key"
GEMINI_MODEL = "gemini-2.0-flash"
USE_LOCAL_LLM = "false"

Step 3: Deploy

  1. Go to https://share.streamlit.io
  2. Click "New app"
  3. Connect your GitHub repository
  4. Set main file path: app/main.py
  5. Add secrets in "Advanced settings" (paste from secrets.toml)
  6. Click "Deploy"

Important Notes

  • Pickle storage is stored in memory and resets when the app redeploys
  • For persistent storage, deploy Endee on a cloud VM and configure ENDEE_HOST
  • Local Mistral model is not available on Streamlit Cloud due to memory limits

Configuration Reference

Variable Default Description
ENDEE_HOST localhost Endee server hostname
ENDEE_PORT 8080 Endee server port
USE_PICKLE_STORAGE false Use pickle file storage instead of Endee
USE_LOCAL_LLM false Use local Mistral model instead of Gemini
GEMINI_API_KEY - Google Gemini API key (required if not using local LLM)
GEMINI_MODEL gemini-2.0-flash Gemini model identifier
MAX_FILE_SIZE_MB 50 Maximum uploaded file size in megabytes
TOP_K_RETRIEVAL 4 Number of chunks to retrieve per query

Project Structure

EndeeProject/
├── app/
│   ├── main.py          # Streamlit user interface
│   ├── ingestion.py     # Document processing and chunking
│   ├── embedding.py     # Sentence transformer embedding generation
│   ├── retrieval.py     # Vector storage and search (Endee/Pickle)
│   ├── generation.py    # LLM integration (Gemini/Mistral)
│   └── utils.py         # Utilities, logging, error handling
├── config/
│   └── settings.py      # Configuration management using pydantic
├── data/                # Uploaded documents and pickle store
├── models/              # Cached sentence transformer models
├── logs/                # Application logs
├── .env.example         # Environment variable template
├── requirements.txt     # Python dependencies
├── Dockerfile           # Application container definition
├── docker-compose.yml   # Multi-container orchestration
└── README.md            # This documentation

Troubleshooting

Endee Connection Failed

Error: Failed to connect to Endee at localhost:8080

Solution: Verify Endee is running:

docker ps | grep endee

If not running:

docker run -d -p 8080:8080 -v endee-data:/data --name endee-server endeeio/endee-server:latest

Gemini API Quota Exceeded (HTTP 429)

Error: You exceeded your current quota

Solution:

Model Not Found (HTTP 404)

Error: models/gemini-1.5-flash is not found

Solution: Update GEMINI_MODEL in .env to gemini-2.0-flash

Empty Search Results

Cause: No documents have been indexed.

Solution: Upload and index documents using the sidebar before querying.

Out of Memory (Local LLM)

Error: Application crashes when using Local Mistral

Solution: Local Mistral requires ~14GB RAM. Use Gemini API instead by setting USE_LOCAL_LLM=false.


License

This project is licensed under the MIT License.


Acknowledgments

About

This a Project made for the Endee Labs on campus drive.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors