BetterSearch 🍛

BetterSearch is a low-latency, neuro-symbolic search engine designed for high-context e-commerce queries.

It solves the precision-recall trade-off in domain-specific search (specifically Indian cuisine) by combining Small Language Models (SLMs) for intent classification with SQLite FTS5 for deterministic retrieval. This architecture eliminates the non-deterministic behaviors of pure vector search (hallucinations, negation failures) while maintaining semantic understanding.

Technical Context

Standard search implementations fail on complex intent queries due to specific limitations:

Keyword Search (FTS): Fails on semantic queries (e.g., "Pet kharab" / "Stomach upset") due to lack of lexical overlap.
Vector Search (Embeddings): Struggles with strict constraints (e.g., "Not spicy", "No onion") and exact SKU retrieval due to vector space density proximity.

BetterSearch implements a Retrieve-and-Rank pipeline where an SLM acts as a query compiler, translating natural language into a structured intermediate representation (IR) that executes against a strictly governed SQL index.

Query Category	Input Example	System Resolution
Contextual	"Pet kharab hai" (Upset stomach)	Maps intent to dietary attributes: `tag:gut_friendly`, `tag:light_meal`, `exclude:spicy`.
Regional/Hinglish	"Vrat ka khana" (Fasting food)	Enforces strict dietary filtering: `diet_type:veg` + `tag:fasting`.
Negation	"Not spicy please"	Logic gate application: `NOT LIKE %spicy%` (Deterministic exclusion).
Abstract	"Bhookh lagi hai" (I'm hungry)	Maps abstract intent to high-volume categories: `category:combo`, `category:meal`.

Architecture

The system follows a split-stack architecture separating Data Preparation from Live Search.

1. Indexing Pipeline (Asynchronous / Build-Time)

Objective: Prepare and enrich the data before any user searches occur.
Network Dependency: High. Uses Cloud APIs (Gemini Pro/OpenAI) to generate intelligence.
Process:
1. Extraction: Raw CSV data is parsed.
2. Enrichment (The "Keyword Blast"): An LLM generates a rich semantic tag cloud for each item (Synonyms, Colloquialisms, Dietary attributes).
3. Normalization: Tags are normalized against a "Constitution" (controlled vocabulary) to prevent index drift.
4. Indexing: Data is stored in SQLite using FTS5 with Trigram tokenization for robust typo tolerance.

2. Query Pipeline (Real-Time / Runtime)

Objective: Execute user searches with sub-300ms latency.
Network Dependency: None. Runs entirely on the local server/edge device for speed and privacy.
Model: Qwen 2.5-3B-Instruct (GGUF Quantized).
Execution Flow:
1. L0 - Short Circuit (In-Memory): Checks for exact string matches or high-confidence fuzzy matches (Levenshtein distance) in RAM. (< 10ms).
2. L1 - Intent Parsing (Neural): If L0 fails, the query is passed to the local SLM.
3. Constraint Enforcement: The SLM output is constrained by a GBNF Grammar, ensuring 100% valid JSON output and adherence to the defined schema.
4. L2 - Progressive Loosening (SQL): The engine executes the query against SQLite. If strict criteria yield zero results, constraints are relaxed in a predefined order (Price -> Category -> Phrase Match -> Negation) to maximize recall.

Technology Stack

Runtime: Python 3.11+, FastAPI
Storage: SQLite (FTS5 Extension)
Inference Engine: llama-cpp-python (with Apple Metal / CUDA acceleration support)
Model: Qwen 2.5-3B-Instruct (Int4 Quantization)
Caching: diskcache (Semantic caching of parsed intents)
Data Processing: litellm, pandas

Setup & Deployment

Prerequisites

Python 3.10+
Hardware Acceleration (Highly Recommended): Apple Silicon M-Series or NVIDIA GPU.

Installation

# 1. Clone repository
git clone [https://github.com/mitulagr2/bettersearch.git](https://github.com/mitulagr2/bettersearch.git)
cd bettersearch

# 2. Create virtual environment
python -m venv venv
source venv/bin/activate

# 3. Install dependencies (Enable Metal support for Mac)
CMAKE_ARGS="-DGGML_METAL=on" pip install -r requirements.txt

Model Provisioning

Download the GGUF model to the models/ directory.

Model: qwen2.5-3b-instruct-q4_k_m.gguf
Source: HuggingFace

Configuration

Set environment variables in .env & app/config.py:

GEMINI_API_KEY="your_key_here" # Required only for indexing pipeline

Usage

1. Run Indexing (Batch Process)

Run this once (or on menu updates) to generate the search index and controlled vocabulary. This step calls external APIs.

python scripts/ingest.py

Outputs: data/soku.db (Search Index) and data/constitution.json (Vocabulary Schema).

2. Start Search Server (Real-Time)

Starts the local API server. This does not make external API calls.

python main.py

Server runs at http://localhost:8000

3. Execute Search

curl -X POST "http://localhost:8000/api/v1/search" \
     -H "Content-Type: application/json" \
     -d '{"query": "High protein veg breakfast"}'

Performance Benchmarking

The repository includes a benchmark suite to validate latency and accuracy across query classifications.

python scripts/benchmark.py

Test Suites:

Direct/Fuzzy: Validates memory-layer regex and fuzzy matching.
Semantic/Hinglish: Validates LLM translation capabilities.
Logical Constraints: Validates price filtering and negation logic.
Cache Efficiency: Measures cold vs. warm inference latency.

Core Design Concepts

1. Controlled Vocabulary ("The Constitution")

To prevent semantic drift where the LLM hallucinates filters (e.g., category="yummy_food"), the system generates a dynamic JSON schema during ingestion. The runtime inference is prompted to restrict outputs strictly to this schema.

2. Progressive Query Relaxation

To minimize "Zero Result" pages, the search engine treats constraints hierarchically. If a specific query (e.g., "Spicy Chicken under 100") fails, the engine rewrites the SQL query iteratively:

Strict: Match Phrases + Tags + Price + Category.
Relax Price: Drop price <= 100 constraint.
Relax Category: Drop strict category filter.
Relax Semantic: Drop Phrase matching, rely on Tags only.
Fallback: Return high-ranking global items.

3. Hybrid Routing Strategy

Tier 1 (Deterministic): Exact substring matches or high-confidence fuzzy matches are served immediately from memory (bypassing the LLM).
Tier 2 (Probabilistic): Complex or abstract queries are routed to the LLM for intent extraction, then executed via SQL.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
app		app
scripts		scripts
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BetterSearch 🍛

Technical Context

Architecture

1. Indexing Pipeline (Asynchronous / Build-Time)

2. Query Pipeline (Real-Time / Runtime)

Technology Stack

Setup & Deployment

Prerequisites

Installation

Model Provisioning

Configuration

Usage

1. Run Indexing (Batch Process)

2. Start Search Server (Real-Time)

3. Execute Search

Performance Benchmarking

Core Design Concepts

1. Controlled Vocabulary ("The Constitution")

2. Progressive Query Relaxation

3. Hybrid Routing Strategy

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

BetterSearch 🍛

Technical Context

Architecture

1. Indexing Pipeline (Asynchronous / Build-Time)

2. Query Pipeline (Real-Time / Runtime)

Technology Stack

Setup & Deployment

Prerequisites

Installation

Model Provisioning

Configuration

Usage

1. Run Indexing (Batch Process)

2. Start Search Server (Real-Time)

3. Execute Search

Performance Benchmarking

Core Design Concepts

1. Controlled Vocabulary ("The Constitution")

2. Progressive Query Relaxation

3. Hybrid Routing Strategy

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages