fix(migrate): parse PGLite string embeddings before re-serializing for Postgres#199
Open
ShadowRaptor wants to merge 1 commit intogarrytan:masterfrom
Open
fix(migrate): parse PGLite string embeddings before re-serializing for Postgres#199ShadowRaptor wants to merge 1 commit intogarrytan:masterfrom
ShadowRaptor wants to merge 1 commit intogarrytan:masterfrom
Conversation
…r Postgres
When `gbrain migrate --to <postgres-target>` reads chunks via
`pglite-engine.getChunksWithEmbeddings`, PGLite returns the `embedding`
column as a JSON-stringified array (e.g. `"[0.1,0.2,...]"`), not a
`Float32Array`. The migrate then passes those chunks to
`postgres-engine.upsertChunks` which calls
`Array.from(chunk.embedding).join(',')` — but `Array.from(string)`
iterates the string CHARACTER-BY-CHARACTER, producing
`"[,0,.,1,,,0,.,2,...]"` and a pgvector parse error:
invalid input syntax for type vector: "[[,-,0,.,0,1,2,5,4,2,7,2,5,,,..."
`pglite-engine.getEmbeddingsByChunkIds` already does the right thing
(parses the string back to `Float32Array`), but `getChunksWithEmbeddings`
was missing the same defensive conversion.
Fix in two places (defense in depth):
1. `pglite-engine.ts getChunksWithEmbeddings` — parse string embeddings
to `Float32Array` at the source, mirroring the pattern in
`getEmbeddingsByChunkIds`.
2. `postgres-engine.ts upsertChunks` — also accept already-stringified
pgvector-format embeddings (pass-through), in case any other call
site hands us a string.
Reproduction (any PGLite source brain that has `chunks.embedding` set):
gbrain init # creates PGLite brain
echo "..." > brain/test.md
gbrain sync # populates chunks + embeddings
gbrain migrate --to <postgres-url> # fails
After patch: migrate completes successfully, embeddings transfer intact.
Verified locally on a 153-page PGLite → Supabase migration (472 chunks,
1536-dim text-embedding-3-large vectors).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
gbrain migrate --to <postgres-url>(e.g. PGLite → Supabase) fails on the very first page with:The corrupted vector value is the embedding being iterated character-by-character with commas.
Root cause
pglite-engine.ts:getChunksWithEmbeddingsreturns theembeddingcolumn as a JSON-stringified array (e.g."[0.1,0.2,...]"), not aFloat32Array. PGLite's pgvector returns vector columns this way.The migrate then passes those chunks to
postgres-engine.ts:upsertChunkswhich serializes embeddings:When
chunk.embeddingis a string,Array.from(string)iterates characters (["[", "0", ".", "1", ",", ...]). Joining with,produces the malformed"[[,0,.,1,,,0,.,2,..."and pgvector rejects it.The sister method
pglite-engine.ts:getEmbeddingsByChunkIdsalready does the right thing — parses the string back toFloat32Array.getChunksWithEmbeddingswas missing the same defensive parse.Fix (defense in depth)
pglite-engine.ts getChunksWithEmbeddings— parse string embeddings toFloat32Arrayat source, mirroring the pattern ingetEmbeddingsByChunkIds.postgres-engine.ts upsertChunks— also accept already-stringified pgvector-format embeddings (pass-through), in case any other call site hands us a string.13 lines added, 2 lines changed.
Reproduction
Any PGLite source brain with
chunks.embeddingpopulated:Verification
After patch: migrate completes successfully, embeddings transfer intact.
Verified on a real 153-page PGLite → Supabase migration (472 chunks, 1536-dim
text-embedding-3-largevectors). Pre-patch failed immediately; post-patch completed in ~3 minutes with all embeddings intact.Notes
getEmbeddingsByChunkIds, so it's symmetry rather than a new contract.81b3f7a).gbrain putreturns the misleading errorPage not found: <slug>when the slug contains uppercase letters. I'll file that as a separate issue.🤖 Generated with Claude Code