Skip to content

Nalini-24/endee_rag_system

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Endee-Powered RAG System

Semantic Search & Grounded Question Answering using Endee Vector Database

πŸš€ Project Overview

This project implements a Retrieval-Augmented Generation (RAG) system powered by the Endee Vector Database.

The system enables:

-Document ingestion from a text file -Embedding generation using SentenceTransformers -Storage of embeddings in Endee -Semantic similarity search using cosine distance -Grounded answer generation using FLAN-T5 -Hallucination prevention using similarity thresholding

This project demonstrates a practical AI/ML application where vector search is the core component of the architecture.

🎯 Problem Statement

Traditional keyword-based search systems fail to understand semantic meaning.

Modern AI systems require:

Embedding-based similarity search

Context retrieval

Grounded answer generation

This project builds a minimal yet production-structured RAG system where:

Endee performs high-performance vector indexing and retrieval

A language model generates answers only from retrieved context

Irrelevant questions are safely rejected

πŸ— System Architecture User Question ↓ SentenceTransformer (Embedding Model) ↓ Endee Vector Database (Cosine Similarity Search) ↓ Top-K Relevant Chunks ↓ Similarity Threshold Filtering (0.55) ↓ FLAN-T5 (LLM) ↓ Grounded Answer OR "I don't know" 🧠 Why Endee?

Endee is used as the primary vector database because it provides:

High-performance similarity search

Cosine distance indexing

Scalable architecture

Efficient C++ backend

Python SDK integration

Docker deployment support

In this project, Endee handles:

Index creation (dimension: 384)

Vector storage

Top-k similarity retrieval

πŸ“‚** Project Structure** endee-rag-system/ β”‚ β”œβ”€β”€ app/ β”‚ β”œβ”€β”€ ingest.py # Load and chunk text file β”‚ β”œβ”€β”€ embed.py # Generate embeddings and upsert into Endee β”‚ β”œβ”€β”€ retrieve.py # Semantic retrieval using Endee SDK β”‚ β”œβ”€β”€ rag.py # Full RAG pipeline (LLM + threshold) β”‚ β”œβ”€β”€ data/ β”‚ └── sample.txt # Knowledge base file β”‚ β”œβ”€β”€ requirements.txt β”œβ”€β”€ README.md πŸ›‘ Hallucination Prevention Strategy

To ensure reliability:

A similarity threshold of 0.55 is enforced.

If no retrieved vector crosses the threshold, the system returns:

I don't know

This guarantees:

No fabricated responses

No answers outside the dataset

Strictly grounded generation

βš™οΈ Setup Instructions: 1️⃣ Start Endee (Docker)

--Build Endee container:

docker build --build-arg BUILD_ARCH=avx2 -t endee-oss -f infra/Dockerfile .

--Run container:

docker run -d -p 8080:8080 --name endee-server endee-oss

--Verify health:

curl http://localhost:8080/api/v1/health 2️⃣ Create Index

Open:

http://localhost:8080

Create index with:

Name: docs

Dimension: 384

Space Type: Cosine Similarity

3️⃣ Install Dependencies pip install -r requirements.txt 4️⃣ Embed & Store Documents python app/embed.py

This:

Loads the sample text file

Generates embeddings using all-MiniLM-L6-v2

Upserts vectors into Endee

5️⃣ Run RAG System python app/rag.py

Example:

Ask a question: What is Endee?

For unrelated questions:

Ask a question: What is psychology?

Expected Output:

I don't know

πŸ§ͺ Technologies Used

Endee Vector Database

Endee Python SDK

SentenceTransformers (all-MiniLM-L6-v2)

HuggingFace Transformers

FLAN-T5 (google/flan-t5-small)

Docker πŸ” Endee Use Case in This Project

In this system, Endee serves as the core semantic retrieval engine powering the RAG pipeline. All document chunks are converted into 384-dimensional embeddings using SentenceTransformers and stored inside an Endee index configured with cosine similarity. When a user submits a question, the query is embedded and searched against the stored vectors using Endee’s high-performance similarity search. The top-k most relevant chunks are retrieved and passed to the language model for grounded response generation. By handling vector indexing, storage, and efficient nearest-neighbor retrieval, Endee enables fast, scalable semantic search and ensures that the generated answers are contextually accurate and strictly based on indexed knowledge. πŸ” Core Features

βœ” Semantic similarity search βœ” Vector-based retrieval βœ” Grounded answer generation βœ” Similarity threshold enforcement βœ” Dockerized vector database βœ” Modular Python architecture

πŸ“Œ Example Use Cases

This system can be extended for:

Domain-specific chatbots

Enterprise knowledge base search

Technical documentation assistants

Research assistants

Internal AI knowledge systems

🚧 Future Improvements

Hybrid search (dense + sparse)

Metadata-based filtering

PDF ingestion pipeline

REST API wrapper (FastAPI)

Web UI frontend

Model upgrade for stronger generation

🎀 Interview Explanation (If Asked)

Q: What happens if a user asks something outside the dataset?

Answer:

The system retrieves top-k vectors from Endee using cosine similarity. If no result exceeds the similarity threshold (0.55), the LLM is not invoked and the system returns "I don't know", preventing hallucination.

πŸ“„ License

This project is developed as part of an AI/ML evaluation using Endee Vector Database.

Endee itself is licensed under Apache 2.0.

About

Semantic Search + Question Answering RAG system using Endee Vector DB

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages