BetterSearch is a low-latency, neuro-symbolic search engine designed for high-context e-commerce queries.
It solves the precision-recall trade-off in domain-specific search (specifically Indian cuisine) by combining Small Language Models (SLMs) for intent classification with SQLite FTS5 for deterministic retrieval. This architecture eliminates the non-deterministic behaviors of pure vector search (hallucinations, negation failures) while maintaining semantic understanding.
Standard search implementations fail on complex intent queries due to specific limitations:
- Keyword Search (FTS): Fails on semantic queries (e.g., "Pet kharab" / "Stomach upset") due to lack of lexical overlap.
- Vector Search (Embeddings): Struggles with strict constraints (e.g., "Not spicy", "No onion") and exact SKU retrieval due to vector space density proximity.
BetterSearch implements a Retrieve-and-Rank pipeline where an SLM acts as a query compiler, translating natural language into a structured intermediate representation (IR) that executes against a strictly governed SQL index.
| Query Category | Input Example | System Resolution |
|---|---|---|
| Contextual | "Pet kharab hai" (Upset stomach) | Maps intent to dietary attributes: tag:gut_friendly, tag:light_meal, exclude:spicy. |
| Regional/Hinglish | "Vrat ka khana" (Fasting food) | Enforces strict dietary filtering: diet_type:veg + tag:fasting. |
| Negation | "Not spicy please" | Logic gate application: NOT LIKE %spicy% (Deterministic exclusion). |
| Abstract | "Bhookh lagi hai" (I'm hungry) | Maps abstract intent to high-volume categories: category:combo, category:meal. |
The system follows a split-stack architecture separating Data Preparation from Live Search.
- Objective: Prepare and enrich the data before any user searches occur.
- Network Dependency: High. Uses Cloud APIs (Gemini Pro/OpenAI) to generate intelligence.
- Process:
- Extraction: Raw CSV data is parsed.
- Enrichment (The "Keyword Blast"): An LLM generates a rich semantic tag cloud for each item (Synonyms, Colloquialisms, Dietary attributes).
- Normalization: Tags are normalized against a "Constitution" (controlled vocabulary) to prevent index drift.
- Indexing: Data is stored in SQLite using FTS5 with Trigram tokenization for robust typo tolerance.
- Objective: Execute user searches with sub-300ms latency.
- Network Dependency: None. Runs entirely on the local server/edge device for speed and privacy.
- Model:
Qwen 2.5-3B-Instruct(GGUF Quantized). - Execution Flow:
- L0 - Short Circuit (In-Memory): Checks for exact string matches or high-confidence fuzzy matches (Levenshtein distance) in RAM. (< 10ms).
- L1 - Intent Parsing (Neural): If L0 fails, the query is passed to the local SLM.
- Constraint Enforcement: The SLM output is constrained by a GBNF Grammar, ensuring 100% valid JSON output and adherence to the defined schema.
- L2 - Progressive Loosening (SQL): The engine executes the query against SQLite. If strict criteria yield zero results, constraints are relaxed in a predefined order (Price -> Category -> Phrase Match -> Negation) to maximize recall.
- Runtime: Python 3.11+, FastAPI
- Storage: SQLite (FTS5 Extension)
- Inference Engine:
llama-cpp-python(with Apple Metal / CUDA acceleration support) - Model: Qwen 2.5-3B-Instruct (Int4 Quantization)
- Caching:
diskcache(Semantic caching of parsed intents) - Data Processing:
litellm,pandas
- Python 3.10+
- Hardware Acceleration (Highly Recommended): Apple Silicon M-Series or NVIDIA GPU.
# 1. Clone repository
git clone [https://github.com/mitulagr2/bettersearch.git](https://github.com/mitulagr2/bettersearch.git)
cd bettersearch
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate
# 3. Install dependencies (Enable Metal support for Mac)
CMAKE_ARGS="-DGGML_METAL=on" pip install -r requirements.txtDownload the GGUF model to the models/ directory.
- Model:
qwen2.5-3b-instruct-q4_k_m.gguf - Source: HuggingFace
Set environment variables in .env & app/config.py:
GEMINI_API_KEY="your_key_here" # Required only for indexing pipelineRun this once (or on menu updates) to generate the search index and controlled vocabulary. This step calls external APIs.
python scripts/ingest.pyOutputs: data/soku.db (Search Index) and data/constitution.json (Vocabulary Schema).
Starts the local API server. This does not make external API calls.
python main.pyServer runs at http://localhost:8000
curl -X POST "http://localhost:8000/api/v1/search" \
-H "Content-Type: application/json" \
-d '{"query": "High protein veg breakfast"}'The repository includes a benchmark suite to validate latency and accuracy across query classifications.
python scripts/benchmark.pyTest Suites:
- Direct/Fuzzy: Validates memory-layer regex and fuzzy matching.
- Semantic/Hinglish: Validates LLM translation capabilities.
- Logical Constraints: Validates price filtering and negation logic.
- Cache Efficiency: Measures cold vs. warm inference latency.
To prevent semantic drift where the LLM hallucinates filters (e.g., category="yummy_food"), the system generates a dynamic JSON schema during ingestion. The runtime inference is prompted to restrict outputs strictly to this schema.
To minimize "Zero Result" pages, the search engine treats constraints hierarchically. If a specific query (e.g., "Spicy Chicken under 100") fails, the engine rewrites the SQL query iteratively:
- Strict: Match Phrases + Tags + Price + Category.
- Relax Price: Drop
price <= 100constraint. - Relax Category: Drop strict
categoryfilter. - Relax Semantic: Drop Phrase matching, rely on Tags only.
- Fallback: Return high-ranking global items.
- Tier 1 (Deterministic): Exact substring matches or high-confidence fuzzy matches are served immediately from memory (bypassing the LLM).
- Tier 2 (Probabilistic): Complex or abstract queries are routed to the LLM for intent extraction, then executed via SQL.