diff --git a/CHANGELOG.md b/CHANGELOG.md index b4558be2..130fd5e2 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,6 +2,81 @@ All notable changes to GBrain will be documented in this file. +## [0.12.2] - 2026-04-19 + +## **Postgres frontmatter queries actually work now.** +## **Wiki articles stop disappearing when you import them.** + +This is a data-correctness hotfix for the `v0.12.0`-and-earlier Postgres-backed brains. If you run gbrain on Postgres or Supabase, you've been losing data without knowing it. PGLite users were unaffected. Upgrade auto-repairs your existing rows. Lands on top of v0.12.1 (extract N+1 fix + migration timeout fix) — pull `gbrain upgrade` and you get both. + +### What was broken + +**Frontmatter columns were silently stored as quoted strings, not JSON.** Every `put_page` wrote `frontmatter` to Postgres via `${JSON.stringify(value)}::jsonb` — postgres.js v3 stringified again on the wire, so the column ended up holding `"\"{\\\"author\\\":\\\"garry\\\"}\""` instead of `{"author":"garry"}`. Every `frontmatter->>'key'` query returned NULL. GIN indexes on JSONB were inert. Same bug on `raw_data.data`, `ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`. PGLite hid this entirely (different driver path) — which is exactly why it slipped past the existing test suite. + +**Wiki articles got truncated by 83% on import.** `splitBody` treated *any* standalone `---` line in body content as a timeline separator. Discovered by @knee5 migrating a 1,991-article wiki where a 23,887-byte article landed in the DB as 593 bytes (4,856 of 6,680 wikilinks lost). + +**`/wiki/` subdirectories silently typed as `concept`.** Articles under `/wiki/analysis/`, `/wiki/guides/`, `/wiki/hardware/`, `/wiki/architecture/`, and `/writing/` defaulted to `type='concept'` — type-filtered queries lost everything in those buckets. + +**pgvector embeddings sometimes returned as strings → NaN search scores.** Discovered by @leonardsellem on Supabase, where `getEmbeddingsByChunkIds` returned `"[0.1,0.2,…]"` instead of `Float32Array`, producing `[NaN]` query scores. + +### What you can do now that you couldn't before + +- **`frontmatter->>'author'` returns `garry`, not NULL.** GIN indexes work. Postgres queries by frontmatter key actually retrieve pages. +- **Wiki articles round-trip intact.** Markdown horizontal rules in body text are horizontal rules, not timeline separators. +- **Recover already-truncated pages with `gbrain sync --full`.** Re-import from your source-of-truth markdown rebuilds `compiled_truth` correctly. +- **Search scores stop going `NaN` on Supabase.** Cosine rescoring sees real `Float32Array` embeddings. +- **Type-filtered queries find your wiki articles.** `/wiki/analysis/` becomes type `analysis`, `/writing/` becomes `writing`, etc. + +### How to upgrade + +```bash +gbrain upgrade +``` + +The `v0.12.2` orchestrator runs automatically: applies any schema changes, then `gbrain repair-jsonb` rewrites every double-encoded row in place using `jsonb_typeof = 'string'` as the guard. Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly. Batches well on large brains. + +If you want to recover pages that were truncated by the splitBody bug: + +```bash +gbrain sync --full +``` + +That re-imports every page from disk, so the new `splitBody` rebuilds the full `compiled_truth` correctly. + +### What's new under the hood + +- **`gbrain repair-jsonb`** — standalone command for the JSONB fix. Run it manually if needed; the migration runs it automatically. `--dry-run` shows what would be repaired without touching data. `--json` for scripting. +- **CI grep guard** at `scripts/check-jsonb-pattern.sh` — fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern. Wired into `bun test` so it runs on every CI invocation. +- **New E2E regression test** at `test/e2e/postgres-jsonb.test.ts` — round-trips all four JSONB write sites against real Postgres and asserts `jsonb_typeof = 'object'` plus `->>` returns the expected scalar. The test that should have caught the original bug. +- **Wikilink extraction** — `[[page]]` and `[[page|Display Text]]` syntaxes now extracted alongside standard `[text](page.md)` markdown links. Includes ancestor-search resolution for wiki KBs where authors omit one or more leading `../`. + +### Migration scope + +The repair touches five JSONB columns: +- `pages.frontmatter` +- `raw_data.data` +- `ingest_log.pages_updated` +- `files.metadata` +- `page_versions.frontmatter` (downstream of `pages.frontmatter` via INSERT...SELECT) + +Other JSONB columns in the schema (`minion_jobs.{data,result,progress,stacktrace}`, `minion_inbox.payload`) were always written via the parameterized `$N::jsonb` form so they were never affected. + +### Behavior changes (read this if you upgrade) + +`splitBody` now requires an explicit sentinel for timeline content. Recognized markers (in priority order): +1. `` (preferred — what `serializeMarkdown` emits) +2. `--- timeline ---` (decorated separator) +3. `---` directly before `## Timeline` or `## History` heading (backward-compat fallback) + +If you intentionally used a plain `---` to mark your timeline section in source markdown, add `` above it manually. The fallback covers the common case (`---` followed by `## Timeline`). + +### Attribution + +Built from community PRs #187 (@knee5) and #175 (@leonardsellem). The original PRs reported the bugs and proposed the fixes; this release re-implements them on top of the v0.12.0 knowledge graph release with expanded migration scope, schema audit (all 5 affected columns vs the 3 originally reported), engine-aware behavior, CI grep guard, and an E2E regression test that should have caught this in the first place. Codex outside-voice review during planning surfaced the missed `page_versions.frontmatter` propagation path and the noisy-truncated-diagnostic anti-pattern that was dropped from this scope. Thanks for finding the bugs and providing the recovery path — both PRs left work to do but the foundation was right. + +Co-Authored-By: @knee5 (PR #187 — splitBody, inferType wiki, JSONB triple-fix) +Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding, getEmbeddingsByChunkIds fix) + ## [0.12.1] - 2026-04-19 ## **Extract no longer hangs on large brains.** diff --git a/CLAUDE.md b/CLAUDE.md index 8b8bd0b2..9578a49d 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -61,7 +61,10 @@ strict behavior when unset. - `src/mcp/server.ts` — MCP stdio server (generated from operations) - `src/commands/auth.ts` — Standalone token management (create/list/revoke/test) - `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally. -- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). All orchestrators are idempotent and resumable from `partial` status. +- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). All orchestrators are idempotent and resumable from `partial` status. +- `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. +- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (``, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics). +- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`. - `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks (Wintermute etc.) to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich. - `src/core/schema-embedded.ts` — AUTO-GENERATED from schema.sql (run `bun run build:schema`) - `src/schema.sql` — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts) @@ -129,6 +132,9 @@ Key commands added for Minions (job queue): - `gbrain jobs stats` — job health dashboard - `gbrain jobs work [--queue Q] [--concurrency N]` — start worker daemon (Postgres only) +Key commands added in v0.12.2: +- `gbrain repair-jsonb [--dry-run] [--json]` — repair double-encoded JSONB rows left over from v0.12.0-and-earlier Postgres writes. Idempotent; PGLite no-ops. The `v0_12_2` migration runs this automatically on `gbrain upgrade`. + ## Testing `bun test` runs all tests. After the v0.12.1 release: ~75 unit test files + 8 E2E test files (1412 unit pass, 119 E2E when `DATABASE_URL` is set — skip gracefully otherwise). Unit tests run @@ -172,12 +178,16 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/features.test.ts` (feature scanning, brain_score calculation, CLI routing, persistence), `test/file-upload-security.test.ts` (symlink traversal, cwd confinement, slug + filename allowlists, remote vs local trust), `test/query-sanitization.test.ts` (prompt-injection stripping, output sanitization, structural boundary), -`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log). +`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log), +`test/repair-jsonb.test.ts` (v0.12.2 JSONB repair: TARGETS list, idempotency, engine-awareness), +`test/migrations-v0_12_2.test.ts` (v0.12.2 orchestrator phases: schema → repair → verify → record), +`test/markdown.test.ts` (splitBody sentinel precedence, horizontal-rule preservation, inferType wiki subtypes). E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`. - `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys). Includes 9 dedicated cases for the postgres-engine `addLinksBatch` / `addTimelineEntriesBatch` bind path — postgres-js's `unnest()` binding is structurally different from PGLite's and gets its own coverage. - `test/e2e/search-quality.test.ts` runs search quality E2E against PGLite (no API keys, in-memory) - `test/e2e/graph-quality.test.ts` runs the v0.10.3 knowledge graph pipeline (auto-link via put_page, reconciliation, traversePaths) against PGLite in-memory +- `test/e2e/postgres-jsonb.test.ts` — v0.12.2 regression test. Round-trips all 5 JSONB write sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter) against real Postgres and asserts `jsonb_typeof='object'` plus `->>'key'` returns the expected scalar. The test that should have caught the original double-encode bug. - `test/e2e/upgrade.test.ts` runs check-update E2E against real GitHub API (network required) - Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI - If `.env.testing` doesn't exist in this directory, check sibling worktrees for one: diff --git a/INSTALL_FOR_AGENTS.md b/INSTALL_FOR_AGENTS.md index 6456f6e5..af7bac5b 100644 --- a/INSTALL_FOR_AGENTS.md +++ b/INSTALL_FOR_AGENTS.md @@ -127,7 +127,7 @@ Verify: `gbrain integrations doctor` (after at least one is configured) ## Step 9: Verify -Read `docs/GBRAIN_VERIFY.md` and run all 6 verification checks. Check #4 (live sync +Read `docs/GBRAIN_VERIFY.md` and run all 7 verification checks. Check #4 (live sync actually works) is the most important. ## Upgrade @@ -145,3 +145,10 @@ this is how features ship in the binary but stay dormant in the user's brain. For v0.12.0+ specifically: if your brain was created before v0.12.0, run `gbrain extract links --source db && gbrain extract timeline --source db` to backfill the new graph layer (see Step 4.5 above). + +For v0.12.2+ specifically: if your brain is Postgres- or Supabase-backed and +predates v0.12.2, the `v0_12_2` migration runs `gbrain repair-jsonb` +automatically during `gbrain post-upgrade` to fix the double-encoded JSONB +columns. PGLite brains no-op. If wiki-style imports were truncated by the old +`splitBody` bug, run `gbrain sync --full` after upgrading to rebuild +`compiled_truth` from source markdown. diff --git a/README.md b/README.md index f8e88a00..309731b4 100644 --- a/README.md +++ b/README.md @@ -536,6 +536,7 @@ ADMIN gbrain integrations Integration recipe dashboard gbrain check-backlinks check|fix Back-link enforcement gbrain lint [--fix] LLM artifact detection + gbrain repair-jsonb [--dry-run] Repair v0.12.0 double-encoded JSONB (Postgres) gbrain transcribe