Endee-Powered RAG System
Semantic Search & Grounded Question Answering using Endee Vector Database
π Project Overview
This project implements a Retrieval-Augmented Generation (RAG) system powered by the Endee Vector Database.
The system enables:
-Document ingestion from a text file -Embedding generation using SentenceTransformers -Storage of embeddings in Endee -Semantic similarity search using cosine distance -Grounded answer generation using FLAN-T5 -Hallucination prevention using similarity thresholding
This project demonstrates a practical AI/ML application where vector search is the core component of the architecture.
π― Problem Statement
Traditional keyword-based search systems fail to understand semantic meaning.
Modern AI systems require:
Embedding-based similarity search
Context retrieval
Grounded answer generation
This project builds a minimal yet production-structured RAG system where:
Endee performs high-performance vector indexing and retrieval
A language model generates answers only from retrieved context
Irrelevant questions are safely rejected
π System Architecture User Question β SentenceTransformer (Embedding Model) β Endee Vector Database (Cosine Similarity Search) β Top-K Relevant Chunks β Similarity Threshold Filtering (0.55) β FLAN-T5 (LLM) β Grounded Answer OR "I don't know" π§ Why Endee?
Endee is used as the primary vector database because it provides:
High-performance similarity search
Cosine distance indexing
Scalable architecture
Efficient C++ backend
Python SDK integration
Docker deployment support
In this project, Endee handles:
Index creation (dimension: 384)
Vector storage
Top-k similarity retrieval
π** Project Structure** endee-rag-system/ β βββ app/ β βββ ingest.py # Load and chunk text file β βββ embed.py # Generate embeddings and upsert into Endee β βββ retrieve.py # Semantic retrieval using Endee SDK β βββ rag.py # Full RAG pipeline (LLM + threshold) β βββ data/ β βββ sample.txt # Knowledge base file β βββ requirements.txt βββ README.md π‘ Hallucination Prevention Strategy
To ensure reliability:
A similarity threshold of 0.55 is enforced.
If no retrieved vector crosses the threshold, the system returns:
I don't know
This guarantees:
No fabricated responses
No answers outside the dataset
Strictly grounded generation
βοΈ Setup Instructions: 1οΈβ£ Start Endee (Docker)
--Build Endee container:
docker build --build-arg BUILD_ARCH=avx2 -t endee-oss -f infra/Dockerfile .
--Run container:
docker run -d -p 8080:8080 --name endee-server endee-oss
--Verify health:
curl http://localhost:8080/api/v1/health 2οΈβ£ Create Index
Open:
Create index with:
Name: docs
Dimension: 384
Space Type: Cosine Similarity
3οΈβ£ Install Dependencies pip install -r requirements.txt 4οΈβ£ Embed & Store Documents python app/embed.py
This:
Loads the sample text file
Generates embeddings using all-MiniLM-L6-v2
Upserts vectors into Endee
5οΈβ£ Run RAG System python app/rag.py
Example:
Ask a question: What is Endee?
For unrelated questions:
Ask a question: What is psychology?
Expected Output:
I don't know
π§ͺ Technologies Used
Endee Vector Database
Endee Python SDK
SentenceTransformers (all-MiniLM-L6-v2)
HuggingFace Transformers
FLAN-T5 (google/flan-t5-small)
Docker π Endee Use Case in This Project
In this system, Endee serves as the core semantic retrieval engine powering the RAG pipeline. All document chunks are converted into 384-dimensional embeddings using SentenceTransformers and stored inside an Endee index configured with cosine similarity. When a user submits a question, the query is embedded and searched against the stored vectors using Endeeβs high-performance similarity search. The top-k most relevant chunks are retrieved and passed to the language model for grounded response generation. By handling vector indexing, storage, and efficient nearest-neighbor retrieval, Endee enables fast, scalable semantic search and ensures that the generated answers are contextually accurate and strictly based on indexed knowledge. π Core Features
β Semantic similarity search β Vector-based retrieval β Grounded answer generation β Similarity threshold enforcement β Dockerized vector database β Modular Python architecture
π Example Use Cases
This system can be extended for:
Domain-specific chatbots
Enterprise knowledge base search
Technical documentation assistants
Research assistants
Internal AI knowledge systems
π§ Future Improvements
Hybrid search (dense + sparse)
Metadata-based filtering
PDF ingestion pipeline
REST API wrapper (FastAPI)
Web UI frontend
Model upgrade for stronger generation
π€ Interview Explanation (If Asked)
Q: What happens if a user asks something outside the dataset?
Answer:
The system retrieves top-k vectors from Endee using cosine similarity. If no result exceeds the similarity threshold (0.55), the LLM is not invoked and the system returns "I don't know", preventing hallucination.
π License
This project is developed as part of an AI/ML evaluation using Endee Vector Database.
Endee itself is licensed under Apache 2.0.