feat: pluggable embedding providers (OpenAI + Gemini) by aloysiusmartis · Pull Request #206 · garrytan/gbrain

aloysiusmartis · 2026-04-18T18:59:35Z

Summary

Adds a provider-agnostic EmbeddingProvider interface so gbrain can use Gemini (gemini-embedding-001, 1–3072 Matryoshka dims) or OpenAI (text-embedding-3-large, 1536 dims) interchangeably
Public embed/embedBatch API in embedding.ts is unchanged — existing call sites see no diff
Fixes a critical silent bug: operations.ts put_page had !process.env.OPENAI_API_KEY hardcoded, so Gemini users got no embeddings on every page import

Architecture

src/core/embedding-provider.ts      — EmbeddingProvider interface + factory
src/core/providers/openai-embedder.ts  — OpenAI impl (extracted from embedding.ts)
src/core/providers/gemini-embedder.ts  — Gemini impl with Matryoshka dim support
src/core/providers/retry-utils.ts      — shared exponentialDelay + sleep
src/commands/migrate-provider.ts       — gbrain migrate --provider openai|gemini

getActiveProvider() reads GBRAIN_EMBEDDING_PROVIDER env and returns the right singleton. isEmbeddingAvailable() replaces all !process.env.OPENAI_API_KEY checks so hybrid search and page ingestion work correctly regardless of which provider is active.

New command: `gbrain migrate --provider`

gbrain migrate --provider gemini               # migrate existing brain
gbrain migrate --provider gemini --dimensions 512  # Matryoshka sub-dim
gbrain migrate --provider openai               # migrate back to OpenAI
gbrain migrate --provider gemini --dry-run     # preview only

ALTER TABLE only when dims actually change (same provider/same dims = re-embed only)
Re-embeds all chunks with the new provider after schema change
Updates config_table + ~/.gbrain/config.json
Remote guard: CLI-only, errors if called via MCP

Init-time provider selection

gbrain init --provider gemini               # new Gemini brain (768 dims default)
gbrain init --provider gemini --dimensions 512
gbrain init --provider openai               # explicit OpenAI (1536 dims default)

Schema

getPGLiteSchema(dims, model) replaces hardcoded vector(1536) in PGLite DDL so gbrain init --provider gemini creates a vector(768) schema from the start.

Config persistence

GBrainConfig gains embedding_provider and embedding_dimensions. loadConfig() propagates them to env vars at startup so subsequent sessions use the same provider without repeating the flag.

Tests

test/embedding-provider.test.ts — 22 unit + 3 live (skipped without API key): factory, fallback, unknown provider error, boundary dims
test/pglite-schema-provider.test.ts — 6 tests for getPGLiteSchema() substitutions
test/config-embedding-provider.test.ts — 4 tests for env-var propagation (no-override behavior)
test/migrate-provider-args.test.ts — 8 tests for dims-change logic and API key guard

All 2627 unit tests pass (bun test, 0 fail).

Relation to PR #197

This overlaps with trymhaak's voyage embedding PR (#197). The approach here uses an interface/factory pattern (EmbeddingProvider) so adding future providers (voyage, cohere, local) is additive — no call site changes. The factory is a single getActiveProvider() call; all embed/embedBatch callers go through embedding.ts unchanged.

If #197 lands first, this PR could subsume it by adding a VoyageEmbedder to src/core/providers/. Happy to coordinate.

Checklist

Public API unchanged (embed, embedBatch, EMBEDDING_MODEL, EMBEDDING_DIMENSIONS in embedding.ts)
All existing tests pass (2627/2627)
No // FORK: comments in this diff
COORDINATION.md and fork-only scripts not included
Remote guard on migrate --provider (MCP callers get a clear error)
NaN guard on --dimensions arg parsing

Adds a provider-agnostic EmbeddingProvider interface so gbrain can use Gemini (text-embedding-004/gemini-embedding-001) instead of OpenAI, selected via GBRAIN_EMBEDDING_PROVIDER env var. The public embed/embedBatch API in embedding.ts is unchanged — callers see no diff. Architecture: - src/core/embedding-provider.ts — EmbeddingProvider interface, factory (getActiveProvider), isEmbeddingAvailable(), resetActiveProvider() - src/core/providers/openai-embedder.ts — OpenAI impl extracted from embedding.ts - src/core/providers/gemini-embedder.ts — Gemini impl with Matryoshka dims - src/core/providers/retry-utils.ts — shared exponentialDelay + sleep Critical fix: operations.ts put_page had hardcoded !process.env.OPENAI_API_KEY, so Gemini users got silent no-embed on every import. Replaced with isEmbeddingAvailable() which checks whichever provider is active. New command: gbrain migrate --provider openai|gemini [--dimensions N] - ALTER TABLE (only when dims change) - Re-embeds all chunks with the new provider - Updates config table + config.json - Remote guard: CLI-only, cannot be called via MCP Schema: getPGLiteSchema(dims, model) replaces hardcoded vector(1536) in PGLite DDL so new Gemini brains get vector(768) from init. Config: GBrainConfig gains embedding_provider + embedding_dimensions; loadConfig() propagates them to env on startup (does not override if already set). Init: gbrain init --provider gemini [--dimensions N] wires provider at brain creation time. Usage: GBRAIN_EMBEDDING_PROVIDER=gemini gbrain init # Gemini brain, 768 dims gbrain migrate --provider gemini # migrate existing brain gbrain migrate --provider openai # migrate back Relates to: upstream PR garrytan#197 (voyage embedding) — same territory but this approach uses an interface/factory pattern that supports N providers without modifying the call sites each time. Co-authored-by: Al's bot <aloysiusmartis@users.noreply.github.com>

aloysiusmartis force-pushed the feat/pluggable-embedding-providers branch from dcad9d4 to 2445e3f Compare April 18, 2026 19:24

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: pluggable embedding providers (OpenAI + Gemini)#206

feat: pluggable embedding providers (OpenAI + Gemini)#206
aloysiusmartis wants to merge 1 commit intogarrytan:masterfrom
aloysiusmartis:feat/pluggable-embedding-providers

aloysiusmartis commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aloysiusmartis commented Apr 18, 2026

Summary

Architecture

New command: gbrain migrate --provider

Init-time provider selection

Schema

Config persistence

Tests

Relation to PR #197

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New command: `gbrain migrate --provider`