fix(migrate): parse PGLite string embeddings before re-serializing for Postgres by ShadowRaptor · Pull Request #199 · garrytan/gbrain

ShadowRaptor · 2026-04-18T16:39:53Z

Summary

gbrain migrate --to <postgres-url> (e.g. PGLite → Supabase) fails on the very first page with:

invalid input syntax for type vector: "[[,-,0,.,0,1,2,5,4,2,7,2,5,,,-,..."

The corrupted vector value is the embedding being iterated character-by-character with commas.

Root cause

pglite-engine.ts:getChunksWithEmbeddings returns the embedding column as a JSON-stringified array (e.g. "[0.1,0.2,...]"), not a Float32Array. PGLite's pgvector returns vector columns this way.

The migrate then passes those chunks to postgres-engine.ts:upsertChunks which serializes embeddings:

const embeddingStr = chunk.embedding
  ? '[' + Array.from(chunk.embedding).join(',') + ']'
  : null;

When chunk.embedding is a string, Array.from(string) iterates characters (["[", "0", ".", "1", ",", ...]). Joining with , produces the malformed "[[,0,.,1,,,0,.,2,..." and pgvector rejects it.

The sister method pglite-engine.ts:getEmbeddingsByChunkIds already does the right thing — parses the string back to Float32Array. getChunksWithEmbeddings was missing the same defensive parse.

Fix (defense in depth)

pglite-engine.ts getChunksWithEmbeddings — parse string embeddings to Float32Array at source, mirroring the pattern in getEmbeddingsByChunkIds.
postgres-engine.ts upsertChunks — also accept already-stringified pgvector-format embeddings (pass-through), in case any other call site hands us a string.

13 lines added, 2 lines changed.

Reproduction

Any PGLite source brain with chunks.embedding populated:

gbrain init                          # creates PGLite brain
# add some content with embeddings
gbrain migrate --to <postgres-url>   # fails on first page

Verification

After patch: migrate completes successfully, embeddings transfer intact.

Verified on a real 153-page PGLite → Supabase migration (472 chunks, 1536-dim text-embedding-3-large vectors). Pre-patch failed immediately; post-patch completed in ~3 minutes with all embeddings intact.

Notes

No new tests added — happy to add them if you want a specific test format. The fix follows the same defensive pattern already in getEmbeddingsByChunkIds, so it's symmetry rather than a new contract.
Found while migrating a 0.10.2 brain. Verified bug still present on master (81b3f7a).
Separate (lower-priority) finding from the same investigation: gbrain put returns the misleading error Page not found: <slug> when the slug contains uppercase letters. I'll file that as a separate issue.

🤖 Generated with Claude Code

…r Postgres When `gbrain migrate --to <postgres-target>` reads chunks via `pglite-engine.getChunksWithEmbeddings`, PGLite returns the `embedding` column as a JSON-stringified array (e.g. `"[0.1,0.2,...]"`), not a `Float32Array`. The migrate then passes those chunks to `postgres-engine.upsertChunks` which calls `Array.from(chunk.embedding).join(',')` — but `Array.from(string)` iterates the string CHARACTER-BY-CHARACTER, producing `"[,0,.,1,,,0,.,2,...]"` and a pgvector parse error: invalid input syntax for type vector: "[[,-,0,.,0,1,2,5,4,2,7,2,5,,,..." `pglite-engine.getEmbeddingsByChunkIds` already does the right thing (parses the string back to `Float32Array`), but `getChunksWithEmbeddings` was missing the same defensive conversion. Fix in two places (defense in depth): 1. `pglite-engine.ts getChunksWithEmbeddings` — parse string embeddings to `Float32Array` at the source, mirroring the pattern in `getEmbeddingsByChunkIds`. 2. `postgres-engine.ts upsertChunks` — also accept already-stringified pgvector-format embeddings (pass-through), in case any other call site hands us a string. Reproduction (any PGLite source brain that has `chunks.embedding` set): gbrain init # creates PGLite brain echo "..." > brain/test.md gbrain sync # populates chunks + embeddings gbrain migrate --to <postgres-url> # fails After patch: migrate completes successfully, embeddings transfer intact. Verified locally on a 153-page PGLite → Supabase migration (472 chunks, 1536-dim text-embedding-3-large vectors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ShadowRaptor mentioned this pull request Apr 18, 2026

Misleading error: "Page not found" when gbrain put receives a slug with uppercase letters #200

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(migrate): parse PGLite string embeddings before re-serializing for Postgres#199

fix(migrate): parse PGLite string embeddings before re-serializing for Postgres#199
ShadowRaptor wants to merge 1 commit intogarrytan:masterfrom
ShadowRaptor:fix/migrate-vector-serialization

ShadowRaptor commented Apr 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ShadowRaptor commented Apr 18, 2026

Summary

Root cause

Fix (defense in depth)

Reproduction

Verification

Notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant