Skip to content

A note-taking app with semantic search capabilities, powered by HuggingFace's all-MiniLM-L6-v2 model and pgvector for efficient similarity search

Notifications You must be signed in to change notification settings

jessecui/semantic-note-search

Repository files navigation

Semantic Search Platform for Personal Notes

A sophisticated semantic search platform for personal notes, leveraging state-of-the-art AI technologies for intelligent text retrieval.

Key Features

  • 🔍 Semantic Search: Powered by HuggingFace's all-MiniLM-L6-v2 sentence transformer model
  • 🎯 Vector Similarity: Efficient cosine similarity search using pgvector
  • 📅 Date Filtering: Filter notes by date range
  • 💾 Saved Searches: Save and quickly access your frequent searches
  • 🔒 Password Protection: Secure access to your notes

Tech Stack

  • Frontend: Next.js, React, TypeScript, Mantine UI
  • Backend: Next.js API Routes
  • Database: Supabase (PostgreSQL)
  • AI/ML:
    • HuggingFace Sentence Transformers (all-MiniLM-L6-v2)
    • pgvector for vector similarity search
  • Authentication: Custom password protection

AI/ML Architecture

The platform uses a sophisticated AI pipeline for semantic search:

  1. Text Embedding:

    • Uses HuggingFace's all-MiniLM-L6-v2 model
    • Converts text into 384-dimensional vector embeddings
    • Optimized for semantic similarity tasks
  2. Vector Search:

    • Implements cosine similarity using pgvector
    • Efficient nearest-neighbor search in high-dimensional space
    • Configurable similarity threshold (default: 0.7)
  3. RPC-based Retrieval:

    • Custom PostgreSQL function for efficient similarity search
    • Supports pagination and date filtering
    • Optimized for large-scale note retrieval

Getting Started

Prerequisites

  • Node.js 18+
  • npm or yarn
  • Supabase account
  • HuggingFace account (for API key)

Environment Setup

Create a .env.local file in the project root:

HUGGINGFACE_API_KEY=your_huggingface_api_key_here
SUPABASE_URL=your_supabase_url_here
SUPABASE_KEY=your_supabase_anon_key_here
PASSWORD=your_app_password_here
NEXT_PUBLIC_SITE_DOMAIN=http://localhost:3000

Database Setup

  1. Create a new Supabase project

  2. Enable the pgvector extension:

    -- Enable vector extension if not already enabled
    create extension if not exists vector;
  3. Create the notes and searches tables and match_notes function:

    -- Create notes table
    create table notes (
        id bigint generated by default as identity primary key,
        text text not null,
        date timestamp with time zone default timezone('utc'::text, now()) not null,
        embedding vector(384)
    );
    
    -- Create searches table
    create table searches (
        id bigint generated by default as identity primary key,
        text text not null,
        date timestamp with time zone default timezone('utc'::text, now()) not null
    );
    
    -- Create match_notes function
    create or replace function match_notes(
        search_embedding vector(384),
        score_minimum float,
        page int,
        page_size int,
        match_start timestamp with time zone default null,
        match_end timestamp with time zone default null
    )
    returns table (
        id bigint,
        text text,
        date timestamp with time zone,
        score float
    )
    language plpgsql
    as $$
    begin
        return query
        select
            notes.id,
            notes.text,
            notes.date,
            1 - (notes.embedding <=> search_embedding) as score
        from notes
        where
            (match_start is null or notes.date >= match_start)
            and (match_end is null or notes.date <= match_end)
            and 1 - (notes.embedding <=> search_embedding) > score_minimum
        order by notes.embedding <=> search_embedding
        limit page_size
        offset page * page_size;
    end;
    $$;

Installation

  1. Clone the repository:

    git clone https://github.com/yourusername/notesearch.git
    cd notesearch
  2. Install dependencies:

    npm install
  3. Run the development server:

    npm run dev
  4. Open http://localhost:3000 with your browser to see the result.

How It Works

  1. Note Creation:

    • When a note is created, its text is sent to the HuggingFace API
    • The API returns a 384-dimensional vector embedding
    • Both the text and embedding are stored in Supabase
  2. Search Process:

    • User enters a search query
    • Query is converted to an embedding using the same model
    • pgvector performs cosine similarity search
    • Results are ranked by similarity score
    • Notes are displayed with their similarity scores
  3. Performance Optimization:

    • Vector similarity search is performed at the database level
    • Results are paginated for efficient loading
    • Date filtering reduces the search space

Example Queries

The semantic search is designed to understand the meaning behind your queries, not just match exact words. Here are some example queries to try:

  1. Topic-based Queries

    • "What did we discuss about the project?"
    • "Show me shopping-related notes"
    • "Family plans and events"
  2. Time-based Queries

    • "What's happening next week?"
    • "Recent meeting notes"
    • "Things to do today"
  3. Context-based Queries

    • "Important things to remember"
    • "Work-related tasks"
    • "Personal errands"
  4. Mixed Context Queries

    • "Things I need to do"
    • "Recent updates"
    • "Important information"

The search will return notes that are semantically related to your query, even if they don't contain the exact words you used. You can combine these queries with date filters to narrow down your results.

Example Notes and Queries

Here's a set of example notes and queries to demonstrate how the semantic search works, which you can test on your own:

Sample Notes

  1. "Meeting with John about Q2 project deliverables and timeline"
  2. "Need to buy groceries: milk, eggs, bread, and vegetables"
  3. "Project deadline moved to next Friday, need to update team"
  4. "Doctor's appointment scheduled for next Tuesday at 2 PM"
  5. "Remember to call mom about weekend plans"
  6. "Team meeting notes: discussed new feature implementation"
  7. "Shopping list: milk, eggs, bread, vegetables, and fruits"
  8. "Q2 project status: 75% complete, on track for deadline"
  9. "Family dinner plans for Saturday evening"
  10. "Weekly team sync meeting moved to Thursday"

Example Queries and Expected Results

  1. Shopping Related

    • Query: "What do I need to buy?"
    • Should match: Notes about shopping lists and groceries
    • Example matches: Notes 2 and 7
  2. Meeting Related

    • Query: "When are my next meetings?"
    • Should match: Notes containing meeting information
    • Example matches: Notes 1, 3, 6, and 10
  3. Project Related

    • Query: "Project status and deadlines"
    • Should match: Notes about project progress and timelines
    • Example matches: Notes 1, 3, and 8
  4. Family Related

    • Query: "Family plans"
    • Should match: Notes about family activities and events
    • Example matches: Notes 5 and 9

These examples demonstrate how the semantic search can find related content even when the exact words don't match. The search understands the context and meaning behind both the notes and queries.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A note-taking app with semantic search capabilities, powered by HuggingFace's all-MiniLM-L6-v2 model and pgvector for efficient similarity search

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published