Semantic Search Platform for Personal Notes

A sophisticated semantic search platform for personal notes, leveraging state-of-the-art AI technologies for intelligent text retrieval.

Key Features

🔍 Semantic Search: Powered by HuggingFace's all-MiniLM-L6-v2 sentence transformer model
🎯 Vector Similarity: Efficient cosine similarity search using pgvector
📅 Date Filtering: Filter notes by date range
💾 Saved Searches: Save and quickly access your frequent searches
🔒 Password Protection: Secure access to your notes

Tech Stack

Frontend: Next.js, React, TypeScript, Mantine UI
Backend: Next.js API Routes
Database: Supabase (PostgreSQL)
AI/ML:
- HuggingFace Sentence Transformers (all-MiniLM-L6-v2)
- pgvector for vector similarity search
Authentication: Custom password protection

AI/ML Architecture

The platform uses a sophisticated AI pipeline for semantic search:

Text Embedding:
- Uses HuggingFace's all-MiniLM-L6-v2 model
- Converts text into 384-dimensional vector embeddings
- Optimized for semantic similarity tasks
Vector Search:
- Implements cosine similarity using pgvector
- Efficient nearest-neighbor search in high-dimensional space
- Configurable similarity threshold (default: 0.7)
RPC-based Retrieval:
- Custom PostgreSQL function for efficient similarity search
- Supports pagination and date filtering
- Optimized for large-scale note retrieval

Getting Started

Prerequisites

Node.js 18+
npm or yarn
Supabase account
HuggingFace account (for API key)

Environment Setup

Create a .env.local file in the project root:

HUGGINGFACE_API_KEY=your_huggingface_api_key_here
SUPABASE_URL=your_supabase_url_here
SUPABASE_KEY=your_supabase_anon_key_here
PASSWORD=your_app_password_here
NEXT_PUBLIC_SITE_DOMAIN=http://localhost:3000

Database Setup

Create a new Supabase project

Enable the pgvector extension:

-- Enable vector extension if not already enabled
create extension if not exists vector;

Create the notes and searches tables and match_notes function:

-- Create notes table
create table notes (
    id bigint generated by default as identity primary key,
    text text not null,
    date timestamp with time zone default timezone('utc'::text, now()) not null,
    embedding vector(384)
);

-- Create searches table
create table searches (
    id bigint generated by default as identity primary key,
    text text not null,
    date timestamp with time zone default timezone('utc'::text, now()) not null
);

-- Create match_notes function
create or replace function match_notes(
    search_embedding vector(384),
    score_minimum float,
    page int,
    page_size int,
    match_start timestamp with time zone default null,
    match_end timestamp with time zone default null
)
returns table (
    id bigint,
    text text,
    date timestamp with time zone,
    score float
)
language plpgsql
as $$
begin
    return query
    select
        notes.id,
        notes.text,
        notes.date,
        1 - (notes.embedding <=> search_embedding) as score
    from notes
    where
        (match_start is null or notes.date >= match_start)
        and (match_end is null or notes.date <= match_end)
        and 1 - (notes.embedding <=> search_embedding) > score_minimum
    order by notes.embedding <=> search_embedding
    limit page_size
    offset page * page_size;
end;
$$;

Installation

Clone the repository:

git clone https://github.com/yourusername/notesearch.git
cd notesearch

Install dependencies:
```
npm install
```
Run the development server:
```
npm run dev
```
Open http://localhost:3000 with your browser to see the result.

How It Works

Note Creation:
- When a note is created, its text is sent to the HuggingFace API
- The API returns a 384-dimensional vector embedding
- Both the text and embedding are stored in Supabase
Search Process:
- User enters a search query
- Query is converted to an embedding using the same model
- pgvector performs cosine similarity search
- Results are ranked by similarity score
- Notes are displayed with their similarity scores
Performance Optimization:
- Vector similarity search is performed at the database level
- Results are paginated for efficient loading
- Date filtering reduces the search space

Example Queries

The semantic search is designed to understand the meaning behind your queries, not just match exact words. Here are some example queries to try:

Topic-based Queries
- "What did we discuss about the project?"
- "Show me shopping-related notes"
- "Family plans and events"
Time-based Queries
- "What's happening next week?"
- "Recent meeting notes"
- "Things to do today"
Context-based Queries
- "Important things to remember"
- "Work-related tasks"
- "Personal errands"
Mixed Context Queries
- "Things I need to do"
- "Recent updates"
- "Important information"

The search will return notes that are semantically related to your query, even if they don't contain the exact words you used. You can combine these queries with date filters to narrow down your results.

Example Notes and Queries

Here's a set of example notes and queries to demonstrate how the semantic search works, which you can test on your own:

Sample Notes

"Meeting with John about Q2 project deliverables and timeline"
"Need to buy groceries: milk, eggs, bread, and vegetables"
"Project deadline moved to next Friday, need to update team"
"Doctor's appointment scheduled for next Tuesday at 2 PM"
"Remember to call mom about weekend plans"
"Team meeting notes: discussed new feature implementation"
"Shopping list: milk, eggs, bread, vegetables, and fruits"
"Q2 project status: 75% complete, on track for deadline"
"Family dinner plans for Saturday evening"
"Weekly team sync meeting moved to Thursday"

Example Queries and Expected Results

Shopping Related
- Query: "What do I need to buy?"
- Should match: Notes about shopping lists and groceries
- Example matches: Notes 2 and 7
Meeting Related
- Query: "When are my next meetings?"
- Should match: Notes containing meeting information
- Example matches: Notes 1, 3, 6, and 10
Project Related
- Query: "Project status and deadlines"
- Should match: Notes about project progress and timelines
- Example matches: Notes 1, 3, and 8
Family Related
- Query: "Family plans"
- Should match: Notes about family activities and events
- Example matches: Notes 5 and 9

These examples demonstrate how the semantic search can find related content even when the exact words don't match. The search understands the context and meaning behind both the notes and queries.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 97 Commits
public		public
src		src
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierrc		.prettierrc
README.md		README.md
next.config.js		next.config.js
package-lock.json		package-lock.json
package.json		package.json
postcss.config.cjs		postcss.config.cjs
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic Search Platform for Personal Notes

Key Features

Tech Stack

AI/ML Architecture

Getting Started

Prerequisites

Environment Setup

Database Setup

Installation

How It Works

Example Queries

Example Notes and Queries

Sample Notes

Example Queries and Expected Results

Contributing

License

About

Uh oh!

Releases

Packages

Languages

jessecui/semantic-note-search

Folders and files

Latest commit

History

Repository files navigation

Semantic Search Platform for Personal Notes

Key Features

Tech Stack

AI/ML Architecture

Getting Started

Prerequisites

Environment Setup

Database Setup

Installation

How It Works

Example Queries

Example Notes and Queries

Sample Notes

Example Queries and Expected Results

Contributing

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages