Endee Semantic Search Demo

This project demonstrates a simple semantic search system built using transformer embeddings and inspired by vector database principles used in Endee.

It converts text documents into vector embeddings and retrieves the most relevant result (Top-1) based on semantic similarity.

Overview

Traditional keyword search matches exact words. Semantic search understands meaning.

This project:

Converts documents into embeddings using a transformer model

Stores embeddings as vectors

Computes cosine similarity

Returns the most relevant document (Top-1)

Tech Stack

Python
sentence-transformers
NumPy
Transformer Model: all-MiniLM-L6-v2
Similarity Metric: Cosine similarity

Key Features

Transformer-based embeddings using all-MiniLM-L6-v2
Cosine similarity ranking
Top-5 semantic retrieval
Lightweight and efficient implementation
Modular structure for future scaling (RAG-ready)

System Architecture

Text documents are stored in data/sample_docs/
Documents are converted into vector embeddings using SentenceTransformers
A user query is converted into an embedding
Cosine similarity is computed between the query and all documents
The Top-5 most similar documents are ranked and displayed

Model Used

all-MiniLM-L6-v2
384-dimensional sentence embeddings
Optimized for semantic similarity tasks
Fast and lightweight transformer model

Project Structure

endee-semantic-search/
│
├── search.py
├── README.md

Setup and Execution

1. Clone the Repository

git clone https://github.com/ad8tea/endee-semantic-search.git
cd endee-semantic-search

2. Create Virtual Environment (Recommended)

python -m venv venv
venv\Scripts\activate   # Windows

3. Install Dependencies

pip install sentence-transformers numpy

4. Run Semantic Search

python search.py

Example Query

health

Example Output

Top Matching Document:

Score: 0.5298
Text: Staying healthy requires regular exercise and proper nutrition.

How It Works

Documents are encoded into dense vectors using a transformer model.
The query is converted into a vector.
Cosine similarity is computed between the query vector and document vectors.
The document with the highest similarity score (Top-1) is returned.

Purpose

This project demonstrates:

Understanding of semantic search Use of transformer embeddings Vector similarity computation Retrieval system design Application of vector database concepts

Future Improvements

Integrate a vector database (e.g., Endee, FAISS)
Add persistent embedding storage
Implement REST API with FastAPI
Extend to Retrieval-Augmented Generation (RAG)
Add evaluation metrics for retrieval performance

Author

Aditi Thakur GitHub: https://www.github.com/ad8tea

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data/sample_docs		data/sample_docs
README.md		README.md
ingest.py		ingest.py
requirements.txt		requirements.txt
search.py		search.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Endee Semantic Search Demo

Overview

Tech Stack

Key Features

System Architecture

Model Used

Project Structure

Setup and Execution

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

4. Run Semantic Search

Example Query

Example Output

How It Works

Purpose

Future Improvements

Author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Endee Semantic Search Demo

Overview

Tech Stack

Key Features

System Architecture

Model Used

Project Structure

Setup and Execution

1. Clone the Repository

2. Create Virtual Environment (Recommended)

3. Install Dependencies

4. Run Semantic Search

Example Query

Example Output

How It Works

Purpose

Future Improvements

Author

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages