A sophisticated semantic search platform for personal notes, leveraging state-of-the-art AI technologies for intelligent text retrieval.
- 🔍 Semantic Search: Powered by HuggingFace's
all-MiniLM-L6-v2sentence transformer model - 🎯 Vector Similarity: Efficient cosine similarity search using pgvector
- 📅 Date Filtering: Filter notes by date range
- 💾 Saved Searches: Save and quickly access your frequent searches
- 🔒 Password Protection: Secure access to your notes
- Frontend: Next.js, React, TypeScript, Mantine UI
- Backend: Next.js API Routes
- Database: Supabase (PostgreSQL)
- AI/ML:
- HuggingFace Sentence Transformers (
all-MiniLM-L6-v2) - pgvector for vector similarity search
- HuggingFace Sentence Transformers (
- Authentication: Custom password protection
The platform uses a sophisticated AI pipeline for semantic search:
-
Text Embedding:
- Uses HuggingFace's
all-MiniLM-L6-v2model - Converts text into 384-dimensional vector embeddings
- Optimized for semantic similarity tasks
- Uses HuggingFace's
-
Vector Search:
- Implements cosine similarity using pgvector
- Efficient nearest-neighbor search in high-dimensional space
- Configurable similarity threshold (default: 0.7)
-
RPC-based Retrieval:
- Custom PostgreSQL function for efficient similarity search
- Supports pagination and date filtering
- Optimized for large-scale note retrieval
- Node.js 18+
- npm or yarn
- Supabase account
- HuggingFace account (for API key)
Create a .env.local file in the project root:
HUGGINGFACE_API_KEY=your_huggingface_api_key_here
SUPABASE_URL=your_supabase_url_here
SUPABASE_KEY=your_supabase_anon_key_here
PASSWORD=your_app_password_here
NEXT_PUBLIC_SITE_DOMAIN=http://localhost:3000-
Create a new Supabase project
-
Enable the pgvector extension:
-- Enable vector extension if not already enabled create extension if not exists vector; -
Create the notes and searches tables and match_notes function:
-- Create notes table create table notes ( id bigint generated by default as identity primary key, text text not null, date timestamp with time zone default timezone('utc'::text, now()) not null, embedding vector(384) ); -- Create searches table create table searches ( id bigint generated by default as identity primary key, text text not null, date timestamp with time zone default timezone('utc'::text, now()) not null ); -- Create match_notes function create or replace function match_notes( search_embedding vector(384), score_minimum float, page int, page_size int, match_start timestamp with time zone default null, match_end timestamp with time zone default null ) returns table ( id bigint, text text, date timestamp with time zone, score float ) language plpgsql as $$ begin return query select notes.id, notes.text, notes.date, 1 - (notes.embedding <=> search_embedding) as score from notes where (match_start is null or notes.date >= match_start) and (match_end is null or notes.date <= match_end) and 1 - (notes.embedding <=> search_embedding) > score_minimum order by notes.embedding <=> search_embedding limit page_size offset page * page_size; end; $$;
-
Clone the repository:
git clone https://github.com/yourusername/notesearch.git cd notesearch -
Install dependencies:
npm install
-
Run the development server:
npm run dev
-
Open http://localhost:3000 with your browser to see the result.
-
Note Creation:
- When a note is created, its text is sent to the HuggingFace API
- The API returns a 384-dimensional vector embedding
- Both the text and embedding are stored in Supabase
-
Search Process:
- User enters a search query
- Query is converted to an embedding using the same model
- pgvector performs cosine similarity search
- Results are ranked by similarity score
- Notes are displayed with their similarity scores
-
Performance Optimization:
- Vector similarity search is performed at the database level
- Results are paginated for efficient loading
- Date filtering reduces the search space
The semantic search is designed to understand the meaning behind your queries, not just match exact words. Here are some example queries to try:
-
Topic-based Queries
- "What did we discuss about the project?"
- "Show me shopping-related notes"
- "Family plans and events"
-
Time-based Queries
- "What's happening next week?"
- "Recent meeting notes"
- "Things to do today"
-
Context-based Queries
- "Important things to remember"
- "Work-related tasks"
- "Personal errands"
-
Mixed Context Queries
- "Things I need to do"
- "Recent updates"
- "Important information"
The search will return notes that are semantically related to your query, even if they don't contain the exact words you used. You can combine these queries with date filters to narrow down your results.
Here's a set of example notes and queries to demonstrate how the semantic search works, which you can test on your own:
- "Meeting with John about Q2 project deliverables and timeline"
- "Need to buy groceries: milk, eggs, bread, and vegetables"
- "Project deadline moved to next Friday, need to update team"
- "Doctor's appointment scheduled for next Tuesday at 2 PM"
- "Remember to call mom about weekend plans"
- "Team meeting notes: discussed new feature implementation"
- "Shopping list: milk, eggs, bread, vegetables, and fruits"
- "Q2 project status: 75% complete, on track for deadline"
- "Family dinner plans for Saturday evening"
- "Weekly team sync meeting moved to Thursday"
-
Shopping Related
- Query: "What do I need to buy?"
- Should match: Notes about shopping lists and groceries
- Example matches: Notes 2 and 7
-
Meeting Related
- Query: "When are my next meetings?"
- Should match: Notes containing meeting information
- Example matches: Notes 1, 3, 6, and 10
-
Project Related
- Query: "Project status and deadlines"
- Should match: Notes about project progress and timelines
- Example matches: Notes 1, 3, and 8
-
Family Related
- Query: "Family plans"
- Should match: Notes about family activities and events
- Example matches: Notes 5 and 9
These examples demonstrate how the semantic search can find related content even when the exact words don't match. The search understands the context and meaning behind both the notes and queries.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.