fix: JSONB double-encode + splitBody wiki + parseEmbedding (v0.12.1)#196
Merged
fix: JSONB double-encode + splitBody wiki + parseEmbedding (v0.12.1)#196
Conversation
- splitBody now requires explicit timeline sentinel (<!-- timeline -->, --- timeline ---, or --- directly before ## Timeline / ## History). A bare --- in body text is a markdown horizontal rule, not a separator. This fixes the 83% content truncation @knee5 reported on a 1,991-article wiki where 4,856 of 6,680 wikilinks were lost. - serializeMarkdown emits <!-- timeline --> sentinel for round-trip stability. - inferType extended with /writing/, /wiki/analysis/, /wiki/guides/, /wiki/hardware/, /wiki/architecture/, /wiki/concepts/. Path order is most-specific-first so projects/blog/writing/essay.md → writing, not project. - PageType union extended: writing, analysis, guide, hardware, architecture. Updates test/import-file.test.ts to use the new sentinel. Co-Authored-By: @knee5 (PR #187) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related Postgres-string-typed-data bugs that PGLite hid:
1. JSONB double-encode (postgres-engine.ts:107,668,846 + files.ts:254):
${JSON.stringify(value)}::jsonb in postgres.js v3 stringified again
on the wire, storing JSONB columns as quoted string literals. Every
frontmatter->>'key' returned NULL on Postgres-backed brains; GIN
indexes were inert. Switched to sql.json(value), which is the
postgres.js-native JSONB encoder (Parameter with OID 3802).
Affected columns: pages.frontmatter, raw_data.data,
ingest_log.pages_updated, files.metadata. page_versions.frontmatter
is downstream via INSERT...SELECT and propagates the fix.
2. pgvector embeddings returning as strings (utils.ts):
getEmbeddingsByChunkIds returned "[0.1,0.2,...]" instead of
Float32Array on Supabase, producing [NaN] cosine scores.
Adds parseEmbedding() helper handling Float32Array, numeric arrays,
and pgvector string format. Throws loud on malformed vectors
(per Codex's no-silent-NaN requirement); returns null for
non-vector strings (treated as "no embedding here"). rowToChunk
delegates to parseEmbedding.
E2E regression test at test/e2e/postgres-jsonb.test.ts asserts
jsonb_typeof = 'object' AND col->>'k' returns expected scalar across
all 5 affected columns — the test that should have caught the original
bug. Runs in CI via the existing pgvector service.
Co-Authored-By: @knee5 (PR #187 — JSONB triple-fix)
Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
extractMarkdownLinks now handles [[page]] and [[page|Display Text]] alongside standard [text](page.md). For wiki KBs where authors omit leading ../ (thinking in wiki-root-relative terms), resolveSlug walks ancestor directories until it finds a matching slug. Without this, wikilinks under tech/wiki/analysis/ targeting [[../../finance/wiki/concepts/foo]] silently dangled when the correct relative depth was 3 × ../ instead of 2. Co-Authored-By: @knee5 (PR #187) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- New gbrain repair-jsonb command. Detects rows where
jsonb_typeof(col) = 'string' and rewrites them via
(col #>> '{}')::jsonb across 5 affected columns:
pages.frontmatter, raw_data.data, ingest_log.pages_updated,
files.metadata, page_versions.frontmatter. Idempotent — re-running
is a no-op. PGLite engines short-circuit cleanly (the bug never
affected the parameterized encode path PGLite uses). --dry-run
shows what would be repaired; --json for scripting.
- New v0_12_1.ts migration orchestrator. Phases: schema → repair → verify.
Modeled on v0_12_0 pattern, registered in migrations/index.ts.
Runs automatically via gbrain upgrade / apply-migrations.
- CI grep guard at scripts/check-jsonb-pattern.sh fails the build if
anyone reintroduces the ${JSON.stringify(x)}::jsonb interpolation
pattern. Wired into bun test via package.json. Best-effort static
analysis (multi-line and helper-wrapped variants are caught by the
E2E round-trip test instead).
- Updates apply-migrations.test.ts expectations to account for the new
v0.12.1 entry in the registry.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- CLAUDE.md: document repair-jsonb command, v0_12_1 migration, splitBody sentinel contract, inferType wiki subtypes, CI grep guard, new test files (repair-jsonb, migrations-v0_12_1, markdown) - README.md: add gbrain repair-jsonb to ADMIN command reference - INSTALL_FOR_AGENTS.md: fix verification count (6 -> 7), add v0.12.1 upgrade guidance for Postgres brains - docs/GBRAIN_VERIFY.md: add check #8 for JSONB integrity on Postgres-backed brains - docs/UPGRADING_DOWNSTREAM_AGENTS.md: add v0.12.1 section with migration steps, splitBody contract, wiki subtype inference - skills/migrate/SKILL.md: document native wikilink extraction via gbrain extract links (v0.12.1+) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced Apr 18, 2026
# Conflicts: # CHANGELOG.md # CLAUDE.md
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Data-correctness hotfix for v0.12.0 Postgres-backed brains. PGLite users were unaffected. Bundles community PRs #187 (@knee5) and #175 (@leonardsellem) with expanded migration scope, schema audit (5 affected JSONB columns vs 3 originally reported), CI grep guard, and an E2E regression test that should have caught the original bug.
JSONB double-encode (Postgres only). Every
${JSON.stringify(value)}::jsonbinterpolation in postgres-engine.ts and files.ts caused postgres.js v3 to stringify again on the wire, storing JSONB columns as quoted string literals. Everyfrontmatter->>'key'returned NULL on Postgres-backed brains. GIN indexes were inert. Switched tosql.json(value)(postgres.js native, OID 3802). Affected columns:pages.frontmatter,raw_data.data,ingest_log.pages_updated,files.metadata,page_versions.frontmatter. PGLite hid this bug entirely — different driver path.splitBody truncation. Treated any standalone
---as timeline separator, causing 83% content truncation on wiki corpora (1,991-article wiki, 4,856 of 6,680 wikilinks lost). New behavior requires explicit sentinel:<!-- timeline -->,--- timeline ---, or---directly before## Timeline/## Historyheading.inferType wiki subtypes. Added
/writing/,/wiki/analysis/,/wiki/guides/,/wiki/hardware/,/wiki/architecture/,/wiki/concepts/. Path order is most-specific-first soprojects/blog/writing/essay.md→writing.pgvector NaN scores (Supabase).
getEmbeddingsByChunkIdsreturned strings instead ofFloat32Arrayon Supabase, producing[NaN]cosine scores. AddsparseEmbedding()helper. Throws loud on malformed vectors (no silent NaN); returns null for non-vector strings.Wikilink extraction.
[[page]]and[[page|Display]]syntaxes now extracted alongside standard[text](page.md).resolveSlug()does ancestor-search for wiki KBs that omit../.Migration. New
gbrain repair-jsonbcommand +v0_12_1orchestrator (schema → repair → verify → record). Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly.CI grep guard at
scripts/check-jsonb-pattern.shfails the build if anyone reintroduces the${JSON.stringify(x)}::jsonbpattern.Test Coverage
E2E suite: 120/120 pass against pgvector/pg16 Docker container. Unit suite: 1415/1415 pass. CI grep guard passes on this diff (no
JSON.stringify(x)::jsonbpatterns in src/).Pre-Landing Review
No new issues found. Specialists already comprehensively covered by
/plan-eng-review+ Codex outside-voice review during planning (25+ findings, 3 material tensions adjudicated).repair-jsonb.tsusessql.unsafewith table/column names from a hardcodedTARGETSarray — no injection vector. Migration is idempotent.parseEmbeddingthrows loud on malformed input per Codex's no-silent-NaN requirement.Plan Completion
All 24 planned items DONE. Scope reduced from 9-PR bundle to 2-PR hotfix per Codex outside-voice scope challenge. The remaining 7 PRs (#184, #177, #132, #114, #115, #119, #123) deferred to v0.12.2 follow-up wave per
/Users/garrytan/.claude/plans/system-instruction-you-are-working-elegant-squid.md.TODOS
No items in
TODOS.mdwere specifically completed by this PR (it focused on BrainBench eval work).Documentation
Documentation was synced to v0.12.1 in commit
998ef82. Six files updated to reflect the JSONB hotfix, splitBody sentinel contract, wiki inferType, and native wikilink extraction.gbrain repair-jsonb [--dry-run]to the ADMIN command reference[[wikilink]]and[[wikilink|Display]]extractionCHANGELOG.md was left untouched (already comprehensive). VERSION bumped to 0.12.1.
Test plan
test/e2e/postgres-jsonb.test.ts)gbrain puta page with frontmatter, queryfrontmatter->>'key'returns the valuegbrain repair-jsonb --dry-runagainst a brain with double-encoded rows reports the correct countAttribution
Built on community PRs:
parseEmbedding()helper,getEmbeddingsByChunkIdsfix.Both PRs reported the bugs and proposed the fixes. Codex outside-voice review during planning surfaced the missed
page_versions.frontmatterpropagation path, dropped the noisy-truncated-diagnostic anti-pattern from scope, and pushed for the engine-aware migration.🤖 Generated with Claude Code