feat: add OpenAI-compatible and Gemini API embedding support by ysys143 · Pull Request #427 · tobi/qmd

ysys143 · 2026-03-18T03:48:29Z

What

Adds external embedding API support as an alternative to local GGUF models. Three environment variables activate API mode:

Variable	Description	Example
`QMD_EMBED_API_URL`	Base URL (presence activates API mode)	`https://api.openai.com/v1\`
`QMD_EMBED_API_KEY`	API key	`sk-...`
`QMD_EMBED_API_MODEL`	Model name	`text-embedding-3-small`

API type is auto-detected from the URL — no extra config needed:

URL contains `googleapis.com` → Gemini (`batchEmbedContents` endpoint)
Otherwise → OpenAI-compatible format (works with Ollama, LM Studio, OpenAI, etc.)

Usage

```bash

OpenAI

export QMD_EMBED_API_URL="https://api.openai.com/v1"
export QMD_EMBED_API_KEY="sk-..."
export QMD_EMBED_API_MODEL="text-embedding-3-small"
qmd embed && qmd vsearch "test query"

Gemini (free tier)

export QMD_EMBED_API_URL="https://generativelanguage.googleapis.com/v1beta"
export QMD_EMBED_API_KEY="AIza..."
export QMD_EMBED_API_MODEL="gemini-embedding-001"
qmd embed && qmd vsearch "test query"

Ollama (OpenAI-compatible)

export QMD_EMBED_API_URL="http://localhost:11434/v1"
export QMD_EMBED_API_MODEL="nomic-embed-text"
qmd embed && qmd vsearch "test query"
```

Why

Local GGUF embedding is great for privacy and offline use, but has tradeoffs:

Requires downloading a model (~300MB+)
Slower on CPU, needs Metal/CUDA for reasonable speed
Limited model selection

API mode is useful when:

Speed matters (cloud APIs are 2–5× faster in practice)
You want higher-dimensional embeddings (3072 vs 768)
Running in CI or on a machine without GPU

Implementation notes

`formatQueryForEmbedding` / `formatDocForEmbedding` return raw text when `modelUri` starts with `api:` — cloud models don't use nomic-style task prefixes
Model stored in `content_vectors.model` as `api:` — switching between local and API requires re-embedding (same as switching local models, existing behavior)
Vector dimension is set dynamically on first embed, so any dimension works without schema changes

Benchmark (63 chunks, Korean+English, M3 Max)

Model	Per chunk	Cost (63 chunks)	Dims
embeddinggemma-300M local	72ms	$0	768
text-embedding-3-small	16ms	$0.000290	1536
text-embedding-3-large	13ms	$0.001882	3072
gemini-embedding-001	38ms	free	3072
gemini-embedding-2-preview	41ms	free	3072

Full benchmark results + script in `docs/embedding-benchmark.md` and `scripts/benchmark-embed.ts`.

Relation to other PRs

#406 and #415 also address remote embedding — worth noting the differences:

feat: add remote embedding provider support (Gemini, OpenAI) #406 introduces a new `src/remote-embedding.ts` class and `index.yml` config file for provider setup
Add external model provider support for embedding & reranking #415 adds an OpenAI-compatible remote server mode (focused on oMLX/Apple Silicon)
This PR uses env vars only (`QMD_EMBED_API_URL` / `QMD_EMBED_API_KEY` / `QMD_EMBED_API_MODEL`), consistent with QMD's existing configuration pattern (`QMD_EMBED_MODEL`, `QMD_EMBED_CTX_SIZE`, etc.), and integrates directly into `llm.ts` with a minimal diff (~130 lines added). No new files or config formats needed.

Happy to collaborate with the other authors or adjust the approach based on feedback.

Adds external embedding API support via three environment variables: - QMD_EMBED_API_URL: base URL (presence activates API mode) - QMD_EMBED_API_KEY: API key - QMD_EMBED_API_MODEL: model name API type is auto-detected from URL: - Contains "googleapis.com" → Gemini (batchEmbedContents) - Otherwise → OpenAI-compatible format (Ollama, LM Studio, OpenAI, etc.) In API mode, formatQueryForEmbedding/formatDocForEmbedding return raw text without task prefixes, since cloud embedding models don't use them. The model URI is stored as "api:<model-name>" in the vectors table. Also adds scripts/benchmark-embed.ts comparing embeddinggemma-300M (local/Metal), text-embedding-3-small/large, gemini-embedding-001, and gemini-embedding-2-preview on 63 Korean+English chunks.

ysys143 mentioned this pull request Mar 18, 2026

Feature request: OpenAI-compatible and Gemini API embedding support #428

Open

docs: translate embedding-benchmark.md to English

910f787

junmo-kim mentioned this pull request Mar 19, 2026

[feature] Why don't support remote embedding services #403

Open

alexleach mentioned this pull request Mar 28, 2026

Add support for remote OpenAI-compatible embeddings #480

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add OpenAI-compatible and Gemini API embedding support#427

feat: add OpenAI-compatible and Gemini API embedding support#427
ysys143 wants to merge 2 commits intotobi:mainfrom
ysys143:feat/api-embedding

ysys143 commented Mar 18, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ysys143 commented Mar 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Usage

OpenAI

Gemini (free tier)

Ollama (OpenAI-compatible)

Why

Implementation notes

Benchmark (63 chunks, Korean+English, M3 Max)

Relation to other PRs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ysys143 commented Mar 18, 2026 •

edited

Loading