Skip to content

feat: Add OpenAI embedding and query expansion support#116

Open
jonesj38 wants to merge 6 commits intotobi:mainfrom
jonesj38:feat/openai-embeddings
Open

feat: Add OpenAI embedding and query expansion support#116
jonesj38 wants to merge 6 commits intotobi:mainfrom
jonesj38:feat/openai-embeddings

Conversation

@jonesj38
Copy link
Copy Markdown

@jonesj38 jonesj38 commented Feb 5, 2026

Summary

Optional OpenAI integration for embeddings and query expansion. Dramatically faster for users who prefer API-based inference over local models.

Performance

Operation Local (llama-cpp) OpenAI
Query expansion 30-40s 200ms
Full re-embed (30k chunks) ~2 hours ~10 min
Tokenizer load 30s 0s
Search latency 30-60s 3-5s
Reranking (30 docs) 10-15s 1-2s

Features

• OpenAI Embeddings — text-embedding-3-small (1536 dims), native batch API, ~$0.02/1M tokens
• OpenAI Query Expansion — gpt-4o-mini for lex/vec/hyde variants
• OpenAI Reranking — API-based reranking replaces local qwen3-reranker, eliminating model download and GGUF inference overhead
• Tiktoken chunking — eliminates model load time for tokenization
• Robust retry logic — exponential backoff with jitter for rate limits
Usage

export OPENAI_API_KEY="sk-..." export QMD_OPENAI=1 qmd embed -f # Re-embed with OpenAI qmd search "query"

Design

• Opt-in — local models remain the default
• Graceful fallback — errors don't crash, just skip
• Replace local reranking with OpenAI — no GGUF model download or local inference needed
• No breaking changes — existing workflows unchanged
Files Changed

• src/openai-llm.ts — new OpenAI LLM implementation
• src/llm.ts — embedding config, provider switching
• src/store.ts — tiktoken chunking integration
• src/qmd.ts — QMD_OPENAI env var support
Dependencies

• openai — API client
• tiktoken — fast BPE tokenization

@lyrl
Copy link
Copy Markdown

lyrl commented Feb 8, 2026

Great. I was looking for this. But the rerank in there doesn't support api calls?

@darkhanakh
Copy link
Copy Markdown

This is great! :) Good job man 🔥

oscar1byte added a commit to runtimecorp/qmd that referenced this pull request Feb 13, 2026
Port PR tobi#116 (tobi/qmd) to current main, adapting to the refactored
codebase. Adds OpenAI as an alternative to local GGUF models, fixing
the ARM64 segfault during hybrid search (issue tobi#68).

Changes:
- New src/openai-llm.ts: OpenAI API client (embed, embedBatch, rerank,
  expandQuery) with exponential backoff and rate limiting
- llm.ts: setEmbeddingConfig(), getDefaultEmbeddingLLM(), isUsingOpenAI()
- collections.ts: EmbeddingProviderConfig type, getEmbeddingConfig()
- store.ts: Provider-aware embedding, chunking (tiktoken), expand, rerank
- qmd.ts: Startup config loading, provider-aware embed command
- package.json: openai + tiktoken dependencies

Config via ~/.config/qmd/index.yml:
  embedding:
    provider: openai
    openai:
      model: text-embedding-3-small

Or env: QMD_OPENAI=1 + OPENAI_API_KEY
@vincentkoc
Copy link
Copy Markdown
Contributor

Love this!

@alexleach
Copy link
Copy Markdown

Can one change config.baseUrl easily? I would like to connect to my own hosted OpenAI-compatible server. It is actually local, but as qmd is running in a container, I need to host the models in Docker Model Runtime to gain GPU acceleration. That is OpenAI-compatible, and like other hosted implementations, it just needs a way to configure the baseUrl...

jonesj38 and others added 5 commits March 28, 2026 01:41
Adds support for using OpenAI's text-embedding-3-small model as an
alternative to local llama-cpp embeddings.

Changes:
- New openai-llm.ts: OpenAI API client implementing LLM interface
- llm.ts: Embedding config management, getDefaultEmbeddingLLM()
- collections.ts: EmbeddingProviderConfig for YAML config schema
- store.ts: Use configurable embedding LLM, skip local model for
  query expansion/rerank when using OpenAI
- qmd.ts: Load embedding config on startup
- package.json: Add openai dependency
- README.md: Documentation for OpenAI embeddings

Configuration (in ~/.config/qmd/index.yml):
  embedding:
    provider: openai
    openai:
      api_key: sk-...  # Optional, falls back to OPENAI_API_KEY env
      model: text-embedding-3-small  # Optional, this is the default

Benefits:
- Much faster embedding (~10x vs local models on CPU)
- No GPU/VRAM requirements
- More reliable (no local model loading issues)
- Cost: ~$0.02 per 1M tokens
- OpenAI embeddings (text-embedding-3-small, 1536d) via QMD_OPENAI=1
- Query expansion with gpt-4o-mini (~200ms vs 30s local)
- Tiktoken for fast tokenization (no model loading)
- Exponential backoff with jitter for rate limits (429)
- Inter-batch delay (150ms) to avoid hitting RPM limits
- Performance: search 3-5s (was 30-60s), embed ~10min (was 2hrs)

Files: openai-llm.ts, llm.ts, store.ts, qmd.ts
Deps: openai, tiktoken
Replace the rerank() stub with a real listwise reranker using gpt-4o-mini.

- Sends top candidates with query to gpt-4o-mini as a ranking task
- Parses comma-separated index output, handles missing/duplicate indices
- Skips API call for ≤2 documents (not worth the latency)
- Falls back to original order on API failure
- Cost: ~$0.001 per rerank call
- Updated qmd.ts to route through OpenAI reranker instead of skipping

The full qmd query pipeline with OpenAI now:
1. Query expansion (gpt-4o-mini)
2. BM25 + vector search (parallel)
3. RRF fusion
4. Cross-encoder reranking (gpt-4o-mini) ← NEW
5. Position-aware blending
Accept comma-separated collection names in -c flag for cross-collection
search. All three search modes (search, vsearch, query) now support
querying multiple collections simultaneously.

Changes:
- resolveCollectionFilter() helper parses and validates comma-separated names
- searchFTS() accepts string | string[] for collection filtering
- searchVec() accepts string | string[] for collection filtering
- SQL uses IN clause for multi-collection filtering
- Updated interface types and test for new parameter types

Usage:
  qmd search 'auth' -c repo-a,repo-b
  qmd vsearch 'auth patterns' -c docs,examples
  qmd query 'OAuth implementation' -c project,patterns,docs

This enables Shad's multi-vault search to pass all vault collections
in a single qmd call instead of running separate searches per collection.
@jonesj38 jonesj38 force-pushed the feat/openai-embeddings branch from 7a718f6 to fc2f137 Compare March 28, 2026 07:44
@jonesj38
Copy link
Copy Markdown
Author

Thanks for the patience on this. I've refreshed it:

Update (2026-03-28)
Rebased feat/openai-embeddings onto current main
Resolved conflicts and cleaned commit history
Force-pushed updated branch (--force-with-lease)
Verified local build passes (bun run build)
Current PR status is now mergeable.

I also got feedback from @alexleach running OpenAI-compatible remote endpoints in minimal Docker environments.

adds configurable OPENAI_BASE_URL, and
avoids initializing/building node-llama-cpp when OpenAI mode is selected.

Waiting for him to re-submit PRs

Cheers

@jonesj38
Copy link
Copy Markdown
Author

PR rebased to main

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants