A technical walkthrough of Endee AI Search: how it indexes products, understands natural language queries, and delivers semantically relevant results — explained from scratch.
- What Are We Building?
- The Complete Architecture at a Glance
- Tech Stack — What Each Tool Does
- Flow 1 — The Indexing Pipeline
- Why Two Separate Indexes and Not One?
- Flow 2 — The Webhook Pipeline (Keeping Index Fresh)
- Flow 3 — The Query Pipeline (Search)
- How Everything Binds Together
- Optimization Techniques Used
- Key Design Decisions Explained
- Glossary for Beginners
Traditional e-commerce search is keyword-based. You type "red shoes" and the search engine looks for products that literally contain the words "red" and "shoes". If a product is called "crimson sneakers", it won't show up — even though it's exactly what you meant.
Endee AI Search solves this. It is a Shopify app that replaces the default search with an AI-powered system that understands what you mean, not just what you typed.
| Traditional Search | Endee AI Search |
|---|---|
| Matches exact keywords | Understands meaning and intent |
| "red shoes" misses "crimson sneakers" | Finds "crimson sneakers" for "red shoes" |
| No understanding of "for my wife" | Extracts gender filter automatically |
| No image understanding | Finds products visually similar to the query |
| Static, rule-based | AI-powered, learns from product catalog |
The system is built on three core ideas:
- Dense retrieval — understand meaning using AI embeddings (CLIP)
- Sparse retrieval — match keywords efficiently using BM25
- NLP understanding — parse user intent using spaCy
┌─────────────────────────────────────────────────────────────────┐
│ MERCHANT STORE │
│ Products: create / update / delete │
└────────────────────────┬────────────────────────────────────────┘
│ Shopify Webhooks
▼
┌─────────────────────────────────────────────────────────────────┐
│ GOOGLE PUB/SUB │
│ Message broker — buffers webhook events │
└────────────────────────┬────────────────────────────────────────┘
│ HTTP Push
▼
┌───────────────────────────────────────────────────────────────┐
│ REMIX APP (Node.js) │
│ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Webhook Route │ │ Search Route │ │
│ │ /webhooks/... │ │ /api/search │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────────────┐ │
│ │ pg-boss │ │ spaCy NLP Service │ │
│ │ (Job Queue) │ │ (Python, port 8100) │ │
│ └────────┬─────────┘ └────────┬─────────────────┘ │
│ │ │ │
│ ▼ ▼ │
│ ┌──────────────────┐ ┌──────────────────┐ │
│ │ Worker Process │ │ CLIP Model │ │
│ │ (processJob) │ │ (ONNX/local) │ │
│ └────────┬─────────┘ └────────┬─────────┘ │
└───────────┼──────────────────────┼────────────────────────────┘
│ │
▼ ▼
┌───────────────────────────────────────────────────────────────┐
│ POSTGRESQL (Neon) │
│ - Merchant sessions │
│ - BM25 index data (per shop) │
│ - Analytics & metrics │
│ - pg-boss job queue tables │
└───────────────────────────────────────────────────────────────┘
│
▼
┌───────────────────────────────────────────────────────────────┐
│ ENDEE VECTOR DB │
│ - {shop}_text index (dense + sparse vectors) │
│ - {shop}_image index (image vectors) │
└───────────────────────────────────────────────────────────────┘
The backbone of the application. Remix is a full-stack web framework built on top of React. In this app, it serves two purposes:
- Admin UI — the dashboard merchants see after installing the app
- API endpoints — the
/api/searchroute that the storefront calls, and the/webhooks/...routes that receive events from Shopify
Think of it as the central nervous system that connects everything.
A relational database. In this system it stores:
- Merchant session data (auth tokens)
- BM25 corpus data per shop (the pre-computed term statistics)
- Analytics (daily search metrics, product counts)
- pg-boss job queue tables (the webhook job queue lives here)
A TypeScript-friendly database client that talks to PostgreSQL. Instead of writing raw SQL, we write TypeScript.
A job queue library that uses PostgreSQL as its storage backend. When a product is created/updated/deleted in a Shopify store, we don't process it immediately (that would be slow and fragile). Instead, we create a "job" in pg-boss and a background worker picks it up and processes it asynchronously.
Why not just process immediately?
- The webhook must respond in under 5 seconds or Shopify considers it failed
- Embedding a product with CLIP can take several seconds
- A merchant might bulk-update 500 products — we need to queue them, not crash
A machine learning model developed by OpenAI. The magic of CLIP is that it can encode both text and images into the same vector space. This means "red shoes" as text and an actual photo of red shoes produce vectors that are mathematically close to each other.
We use Xenova/clip-vit-base-patch16 — a pre-trained CLIP model running locally via ONNX (no API calls to OpenAI). It produces 512-dimensional vectors.
A Python NLP (Natural Language Processing) library. It runs as a separate microservice on port 8100. When a search query comes in, we first send it to spaCy to understand:
- What product category is being searched (head noun)
- Gender signals ("for my wife" → filter by women)
- Price constraints ("under $100")
- Recipient hints ("for my boyfriend")
- The best query text for BM25 (expanded) and for CLIP (cleaned)
BM25 (Best Match 25) is a classical information retrieval algorithm — the same one that powers search engines like Elasticsearch and Solr. It scores products based on keyword relevance using term frequency, document frequency, and document length normalization.
Unlike CLIP which understands meaning, BM25 is very good at exact keyword matching. Together they complement each other.
A vector database purpose-built for this system. It stores vectors (arrays of numbers representing products) and can find the most similar ones given a query vector. Each shop gets two indices:
{shop}_text— stores text vectors (dense from CLIP + sparse from BM25){shop}_image— stores image vectors (one per product image)
A message queue service. Shopify sends webhooks to Pub/Sub, which then forwards them to our app. It acts as a reliable buffer — if our app is temporarily down, Pub/Sub holds the messages for up to 7 days and retries.
The shopify.app.toml file is the configuration contract between the app and Shopify's platform. It tells Shopify:
- Where to send webhook events
- What OAuth scopes (permissions) the app needs
- Where to redirect merchants after OAuth
- The app proxy URL for storefront search
This runs once when a merchant installs the app or manually triggers a re-index.
The goal: take all products from a merchant's Shopify store and represent them as vectors so they can be searched later.
A merchant installs the app or clicks "Re-index" in the dashboard. This fires off the indexing process for their shop.
Using Shopify's Admin GraphQL API, we fetch all products from the merchant's store in batches. Each product contains: title, description (HTML), handle, vendor, product type, tags, variants (with prices), and image URLs.
Shopify GraphQL API
→ products (title, description, tags, vendor, images, variants, ...)
→ paginated in batches of 250
For each product, we build a single text "passage" by combining all textual fields:
"id: 12345 title: Classic White Sneakers description: Lightweight canvas sneakers
handle: classic-white-sneakers vendor: NikeStore productType: Footwear tags: shoes,white,casual"
This passage represents the product as a single string that both BM25 and CLIP will process.
Before we can use BM25 to score products, we need to build a corpus — a statistical model of the entire product catalog.
BM25 needs to know:
- Total documents (N) — how many products the shop has
- Document frequency (DF) — how many products contain each term
- Average document length — to normalize for product description length
Why do we need this?
Imagine the word "men" appears in 900 out of 1000 products — it's everywhere. BM25 will give it a low weight because it doesn't help distinguish products.
But the word "cashmere" appears in only 5 products — it's rare and meaningful. BM25 gives it a high weight.
This is captured by IDF (Inverse Document Frequency):
IDF = log( 1 + (N - DF + 0.5) / (DF + 0.5) )
We compute this for every term in the catalog and store the result in PostgreSQL (in the bm25Data table) so we don't have to recompute it on every search query.
BM25 Parameters used:
k1 = 2.0— controls term frequency saturation. High k1 means repeated words still contribute more score. Tuned higher (vs standard 1.2) for recall.b = 0.85— controls document length normalization. Penalizes long documents that contain a term simply because they have more words.IDF floor = 0.25— even very common terms get a small minimum weight, so no term is completely ignored.
Each product's passage is encoded through the CLIP text encoder to produce a 512-dimensional dense vector.
"Classic White Sneakers, Footwear, casual, white, shoes"
→ CLIP Text Encoder
→ [0.021, -0.134, 0.089, ..., 0.045] (512 numbers)
This vector captures the semantic meaning of the product. Products with similar meanings will have similar vectors (small cosine distance between them).
For each product image URL, we download the image and encode it through the CLIP vision encoder:
[product image] → CLIP Vision Encoder → [0.033, -0.092, 0.101, ..., 0.077] (512 numbers)
The critical property: text vectors and image vectors live in the same space. So when a user searches "red sneakers" (text), the text vector will be close to the image vector of a red sneaker photo.
Images are processed in batches of 8 to manage memory.
Each product gets stored in Endee as two separate records:
Text index ({shop}_text):
{
id: "12345",
vector: [dense CLIP text vector — 512 dims],
sparseIndices: [23, 441, 1092, ...], ← BM25 term indices
sparseValues: [0.82, 0.61, 0.33, ...], ← BM25 weights
meta: { title, vendor, tags, variants, images, ... }
}
Image index ({shop}_image):
{
id: "12345__img_0",
vector: [dense CLIP image vector — 512 dims],
meta: { productId: "12345", imageUrl: "https://...", ...productData }
}
After this step, the merchant's store is fully indexed and ready for search.
This is an important design decision. A beginner might ask: since CLIP produces 512-dimensional vectors for both text and images, why not just put everything in one index?
The core problem: one product has multiple images, but only one text representation.
A product like "Classic White Sneakers" has:
- 1 text passage → 1 text vector
- 5 images → 5 image vectors
If you put everything in one index, each record needs a single id. But what id do you give the 5 image vectors? You'd use 12345__img_0, 12345__img_1, etc. — which means the same product now has 6 entries in one index.
This creates serious problems:
Problem 1 — Result duplication in search
When you query the combined index, the same product could appear 6 times in the top results (once for the text match, 5 times for each image match). You'd have to deduplicate manually and figure out which result "represents" the product.
With two separate indexes, text and image retrieval are cleanly separated — each gives you one ranked list. RRF then merges them at the product level (not the record level).
Problem 2 — Incompatible scoring signals
The text index stores both dense and sparse vectors (CLIP + BM25). BM25 produces sparse vectors based on keywords — keywords that only make sense for text passages, not images. An image vector has no BM25 representation. Mixing image records (no sparse vector) with text records (has sparse vector) in one index would corrupt the sparse retrieval entirely.
Problem 3 — Different granularity
Text search operates at product granularity — one product, one score. Image search operates at image granularity — one product can match through any of its images, and we want the best image match per product.
Separating the indexes lets us keep this distinction clean. In the image index, we find the best-ranked image per product using the productId in metadata:
If everything were in one index, this aggregation logic would be much harder and messier.
Problem 4 — Index pollution
Text search using BM25 (sparse retrieval) should only search over text documents. If image records are in the same index, the BM25 query vector would try to match against image records that have zero sparse representation — they'd contribute 0 to sparse scores, effectively polluting the result set with irrelevant low-scoring entries.
Summary:
| One Combined Index | Two Separate Indexes | |
|---|---|---|
| Result deduplication | Manual, complex | Clean — separate ranked lists |
| BM25 sparse search | Polluted by image records | Only searches text records |
| Image matching | Messy multi-record per product | Clean, best image per product via RRF |
| Granularity | Mixed | Each index has consistent granularity |
| Scoring logic | Complex workarounds needed | Simple RRF merge at the end |
Two indexes is more storage, but the retrieval quality and code simplicity more than justify it.
This runs continuously in the background, keeping the index in sync with the store.
When a merchant creates, updates, or deletes a product after the initial index is built, we need to reflect that change in Endee. This is done asynchronously through a webhook pipeline.
Merchant creates a product in Shopify
↓
Shopify fires a "products/create" webhook
↓
Google Pub/Sub receives it (configured in shopify.app.toml)
↓
Pub/Sub pushes an HTTP POST to /webhooks/app/products/create
↓
Remix route parses the Pub/Sub message format:
- topic: "products/create"
- shopDomain: "merchant-store.myshopify.com"
- eventId: "abc-123"
- productData: { id, title, images, ... } (base64 decoded)
↓
addProductWebhookJob() is called
↓
pg-boss inserts a job into PostgreSQL:
{
queue: "product-webhooks",
data: { type, shopDomain, productId, productData, eventId },
singletonKey: "product.create-abc-123" ← deduplication
}
↓
Route returns HTTP 200 to Pub/Sub immediately
↓
Background worker (startProductWorker) picks up the job
↓
processJob() runs:
- Build text passage
- CLIP embed text
- CLIP embed images (for create/update)
- BM25 encode
- Endee upsert (or deleteVector for delete)
↓
Job marked complete in pg-boss
The webhook route must respond with 200 in under 5 seconds or Pub/Sub/Shopify marks it as failed and retries. But embedding a product takes 2-10 seconds depending on image count.
The queue decouples the two:
- Webhook route: receive → queue job → return 200 (takes <100ms)
- Worker: pick up job → embed → store (takes however long it needs)
When Shopify updates a product, it fires both a products/create and products/update event in quick succession. Without deduplication, we'd process the same product twice.
Two layers of dedup:
- pg-boss
singletonKey— sameeventIdcan't be queued twice - In-memory
recentOperationsMap — if an update comes within 10 seconds of a create for the same product, it's skipped
If embedding fails (network error, CLIP timeout), pg-boss automatically retries the job:
- Up to 5 retries
- Exponential backoff (2s → 4s → 8s → 16s → 32s)
This runs on every search request from a shopper in a merchant's storefront.
Shopper types: "gift for my wife under $100"
↓
Storefront sends: GET /api/search?q=gift+for+my+wife+under+100&shop=store.myshopify.com
↓
Remix loader handles the request
↓
searchProducts(shop, query) is called
The raw query is sent to the Python spaCy microservice (running on port 8100):
POST http://127.0.0.1:8100/parse
{ "query": "gift for my wife under $100" }
spaCy returns a rich analysis:
{
"head_noun": "gift",
"intent": "gift",
"recipient": "wife",
"filters": {
"gender": "women",
"price": { "max": 100 }
},
"expanded_query": "gift wife women accessories",
"dense_query": "gift for wife"
}What spaCy extracts:
| What | How | Example |
|---|---|---|
| Head noun | Syntactic parse tree root | "gift for my wife" → "gift" |
| Gender | Pronouns + relationship words | "for my wife" → "women" |
| Recipient | Preposition "for" + possessive | "for my wife" → "wife" |
| Price | Named entity recognition + regex | "under $100" → { max: 100 } |
| Expanded query | Adds modifiers, lemmas | Better BM25 recall |
| Dense query | Price terms stripped | Better CLIP semantics |
From spaCy's output, we build structured filters to pass to Endee:
filters = [{ gender: { "$eq": "women" } }]These are applied to both sparse and dense search to pre-filter candidates.
The cleaned query (without price noise) is encoded through CLIP to get the query vector:
"gift for wife" → CLIP Text Encoder → [query vector — 512 dims]
This same vector is used for both text index dense search AND image index search (because CLIP puts text and images in the same space).
The expanded query is encoded through BM25 to get a sparse vector:
"gift wife women accessories" → BM25 Encoder (loaded from PostgreSQL) → {
sparseIndices: [23, 441, 1092, 55],
sparseValues: [0.82, 0.61, 0.33, 0.91]
}
The BM25 encoder uses the pre-built corpus statistics (term frequencies, document frequencies, average length) that were computed during indexing.
All three searches fire simultaneously (in parallel) using Promise.all:
Each retrieval returns up to 500 candidates with their product metadata.
Why three separate retrievals?
| Retrieval | Strength | Example |
|---|---|---|
| BM25 sparse | Exact keywords, product types | "Nike Air Max" → exact brand match |
| CLIP text dense | Semantic meaning | "something cozy" → finds "fleece jacket" |
| CLIP image | Visual similarity | "red dress" → matches red-colored products visually |
Each alone is incomplete. Together they cover all the ways a shopper might search.
Each retrieval method returns its own ranked list. We need to combine them into one final ranking. This is done with RRF (Reciprocal Rank Fusion).
How RRF works:
For each product, calculate its score from each retrieval method based on its rank:
RRF score = 1 / (60 + rank)
- Rank 1 → 1/61 = 0.0164
- Rank 10 → 1/70 = 0.0143
- Rank 100 → 1/160 = 0.0063
- Not in results → 0
Why 60? The constant 60 dampens the difference between high and low ranks. A product ranked 1st doesn't get astronomically more score than one ranked 2nd.
Combined score:
baseScore = sparseRRF + denseRRF + imageRRF
spaCy identified the head noun ("gift"). We apply a 1.5x boost to products whose type, title, or tags contain that noun:
This ensures "gift boxes" rank higher than unrelated products that happened to score well on keyword matching.
If spaCy extracted a price constraint ("under $100"), we filter out candidates that don't match:
This is done post-retrieval (after scoring) rather than pre-retrieval because price data is stored in product metadata, not as a separate indexed field.
The top 30 results are returned to the storefront with their scores included for debugging.
Here is the complete picture — all three flows unified:
┌─────────────────────────────────────────────────────────────────┐
│ ONE-TIME: INITIAL INDEXING │
│ │
│ Merchant installs → Fetch all products → Build BM25 corpus │
│ → CLIP embed text → CLIP embed images → Store in Endee │
│ → Save BM25 stats to PostgreSQL │
└─────────────────────────────────────────────────────────────────┘
↓ Index is ready
┌─────────────────────────────────────────────────────────────────┐
│ CONTINUOUS: WEBHOOK PIPELINE │
│ │
│ Product change → Pub/Sub → Webhook route → pg-boss queue │
│ → Background worker → Re-embed changed product → Update Endee │
└─────────────────────────────────────────────────────────────────┘
↓ Index stays fresh
┌─────────────────────────────────────────────────────────────────┐
│ PER QUERY: SEARCH PIPELINE │
│ │
│ Query → spaCy (NLP) → CLIP (encode) → BM25 (encode) │
│ → 3 parallel searches in Endee │
│ → RRF scoring → Head noun boost → Price filter → Top 30 │
└─────────────────────────────────────────────────────────────────┘
All three index searches (sparse, dense, image) happen simultaneously:
Without this: 3 sequential searches × ~50ms each = ~150ms With this: all 3 run simultaneously = ~50ms
CLIP models are large (hundreds of MB). We load them once and reuse:
Without this: model loads on every request = seconds of latency With this: model loads once on startup, reused on every request
The BM25 corpus (term frequencies, document frequencies) is expensive to build — it requires scanning all product text. We compute it once during indexing and store the result in PostgreSQL.
On every search query, we just load the pre-computed statistics:
CLIP image embedding is memory-intensive. Processing images in groups of 8 keeps memory usage bounded:
The background worker processes 5 webhook jobs in parallel per poll cycle:
This provides throughput without overwhelming the database or CLIP model.
Unknown words (Out-Of-Vocabulary) are mapped to a special __OOV__ bucket with a document frequency of 90% of total documents:
This means unknown query terms get a small but non-zero weight — they don't contribute much but they don't break the scoring either.
Even very common terms get a minimum IDF of 0.25:
Without this floor, extremely common terms (like "product") would get near-zero IDF and be completely ignored. The floor ensures they still contribute weakly.
Short search queries (1-2 words) are not penalized by the length normalization factor:
Without clamping, a 1-word query like "shoes" would be penalized because BM25's normalization expects the query to be at least as long as the average document. Clamping to minimum 3 ensures short queries work well.
A product can have many images. In image results, we keep only the best-ranked image per product to avoid the same product dominating results:
Shopify sometimes sends both a products/create and products/update event for the same product in rapid succession. We suppress the redundant update:
CLIP understands both text and images in a unified embedding space. This means:
- A text query ("red floral dress") is close in vector space to an image of a red floral dress
- This enables cross-modal search — text queries find products by their visual appearance
Text-only models can't do this.
CLIP is great at semantic understanding but can miss exact matches. If someone searches "Nike Air Max 270", CLIP might return other Nike shoes or similar styles. BM25 will score the exact product very high because it matches the exact model number.
Combining both gives you the best of both worlds: exact keyword precision from BM25, semantic generalization from CLIP.
spaCy's en_core_web_md model runs in Python. Our app is Node.js/TypeScript. Running spaCy as a separate HTTP microservice (on port 8100) lets both coexist:
- Node.js handles the web server and API
- Python handles NLP
- They communicate via HTTP
Redis is an in-memory data store. BullMQ uses Redis as its job queue backend. The problem: Redis commands are consumed even when there are no webhook events (BullMQ's worker polls Redis every few milliseconds for new jobs).
pg-boss uses PostgreSQL — which the app already uses for sessions and BM25 data. Job polling happens every 2 seconds, consuming negligible database resources. No additional service to manage or pay for.
Shopify requires webhook endpoints to respond in under 5 seconds. If our app goes down during maintenance, Pub/Sub holds the messages for up to 7 days. Combined with pg-boss's retry logic, this creates a robust, fault-tolerant pipeline where no product events are lost.
This file is the single source of truth for the app's relationship with Shopify. It defines webhook subscriptions, OAuth scopes, and the app proxy URL. Running shopify app deploy pushes this configuration to Shopify's servers — without it, none of the webhook routing or OAuth would work.
| Term | Simple Explanation |
|---|---|
| Vector / Embedding | A list of numbers that represents the meaning of text or an image. Similar things have similar numbers. |
| Dense vector | A full 512-number list from CLIP. Every number has a value. |
| Sparse vector | A list where most values are zero. Only terms present in the query have non-zero weights. Used in BM25. |
| BM25 | A scoring formula that says "how relevant is this product to this query based on keywords?" |
| IDF | How rare a word is. Rare words matter more in search. |
| TF (Term Frequency) | How many times a word appears in a document. |
| k1 parameter | Controls how much extra TF helps. Beyond a point, repeating a word doesn't help much more. |
| b parameter | Controls how much document length matters. Longer documents get slightly penalized. |
| Cosine similarity | A way to measure how similar two vectors are. 1 = identical, 0 = unrelated. |
| RRF | A formula to merge multiple ranked lists into one. Each result gets points based on its rank in each list. |
| spaCy | A Python library that understands grammar and meaning in sentences. |
| Head noun | The main noun in a phrase. In "casual red shoes for men", the head noun is "shoes". |
| NER | Named Entity Recognition — identifying things like names, prices, locations in text. |
| ONNX | A format for AI models. Lets you run models trained in Python within JavaScript. |
| Webhook | An HTTP notification Shopify sends to your app when something happens in a store. |
| Pub/Sub | A publish-subscribe system. Shopify publishes events; your app subscribes to receive them. |
| Job Queue | A list of tasks waiting to be processed. Workers pick tasks off the queue and process them. |
| pg-boss | A job queue that stores jobs in PostgreSQL tables. |
| Upsert | Insert if the record doesn't exist, update if it does. |
| Singleton key | A unique identifier that prevents the same job from being queued twice. |
| SIGTERM | A signal sent to a process asking it to shut down gracefully. |
| ORM | Object-Relational Mapper — maps database tables to TypeScript objects (Prisma does this). |
| Remix resource route | A Remix route file that only handles API requests (no UI). Named with dots in the filename. |