diff --git a/CHANGELOG.md b/CHANGELOG.md index 8d4b8b50..fa100f37 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -4,8 +4,309 @@ All notable changes to PBrain will be documented in this file. > **Fork notice.** PBrain is a fork of [GBrain](https://github.com/garrytan/gbrain) by [Garry Tan](https://github.com/garrytan). All entries below `[0.1.0]` describe work done on the GBrain project under its original name and are preserved for historical context. See [NOTICE](NOTICE) and [docs/ATTRIBUTION.md](docs/ATTRIBUTION.md) for attribution. + +## [0.12.3] - 2026-04-19 + +## **Reliability wave: the pieces v0.12.2 didn't cover.** +## **Sync stops hanging. Search timeouts stop leaking. `[[Wikilinks]]` are edges.** + +v0.12.2 shipped the data-correctness hotfix (JSONB double-encode, splitBody, `/wiki/` types, parseEmbedding). This wave lands the remaining reliability fixes from the same community review pass, plus a graph-layer feature a 2,100-page brain needed to stop bleeding edges. No schema changes. No migration. `gbrain upgrade` pulls it. + +### What was broken + +**Incremental sync deadlocked past 10 files.** `src/commands/sync.ts` wrapped the whole import in `engine.transaction`, and `importFromContent` also wrapped each file. PGLite's `_runExclusiveTransaction` is non-reentrant — the inner call parks on the mutex the outer call holds, forever. In practice: 3 files synced fine, 15 files hung in `ep_poll` until you killed the process. Bulk Minions jobs and citation-fixer dream-cycles regularly hit this. Discovered by @sunnnybala. + +**`statement_timeout` leaked across the postgres.js pool.** `searchKeyword` and `searchVector` bounded queries with `SET statement_timeout='8s'` + `finally SET 0`. But every tagged template picks an arbitrary pool connection, so the SET, the query, and the reset could land on three different sockets. The 8s cap stuck to whichever connection ran the SET, got returned to the pool, and the next unrelated caller inherited it. Long-running `embed --all` jobs and imports clipped silently. Fix by @garagon. + +**Obsidian `[[WikiLinks]]` were invisible to the auto-link post-hook.** `extractEntityRefs` only matched `[Name](people/slug)`. On a 2,100-page brain with wikilinks throughout, `put_page` extracted zero auto-links. `DIR_PATTERN` also missed domain-organized wiki roots (`entities`, `projects`, `tech`, `finance`, `personal`, `openclaw`). After the fix: 1,377 new typed edges on a single `extract --source db` pass. Discovered and fixed by @knee5. + +**Corrupt embedding rows broke every query that touched them.** `getEmbeddingsByChunkIds` on Supabase could return a pgvector string instead of a `Float32Array`. v0.12.2 fixed the normal path by normalizing inputs, but one genuinely bad row still threw and killed the ranking pass. Availability matters more than strictness on the read path. + +### What you can do now that you couldn't before + +- **Sync 100 files without hanging.** Per-file atomicity preserved, outer wrap removed. Regression test asserts `engine.transaction` is not called at the top level of `src/commands/sync.ts`. Contributed by @sunnnybala. +- **Run a long `embed --all` on Supabase without strangling unrelated queries.** `searchKeyword` / `searchVector` use `sql.begin` + `SET LOCAL` so the timeout dies with the transaction. 5 regression tests in `test/postgres-engine.test.ts` pin the new shape. Contributed by @garagon. +- **Write `[[people/balaji|Balaji Srinivasan]]` in a page and see a typed edge.** Same extractor, two syntaxes. Matches the filesystem walker — the db and fs sources now produce the same link graph from the same content. Contributed by @knee5. +- **Find your under-connected pages.** `gbrain orphans` surfaces pages with zero inbound wikilinks, grouped by domain. `--json`, `--count`, and `--include-pseudo` flags. Also exposed as the `find_orphans` MCP operation so agents can run enrichment cycles without CLI glue. Contributed by @knee5. +- **Degraded embedding rows skip+warn instead of throwing.** New `tryParseEmbedding()` sibling of `parseEmbedding()`: returns `null` on unknown input and warns once per process. Used on the search/rescore path. Migration and ingest paths still throw — data integrity there is non-negotiable. +- **`gbrain doctor` tells you which brains still need repair.** Two new checks: `jsonb_integrity` scans the four v0.12.0 write sites and reports rows where `jsonb_typeof = 'string'`; `markdown_body_completeness` heuristically flags pages whose `compiled_truth` is <30% of raw source length when raw has multiple H2/H3 boundaries. Fix hint points at `gbrain repair-jsonb` and `gbrain sync --force`. + +### How to upgrade + +```bash +gbrain upgrade +``` + +No migration, no schema change, no data touch. If you're on Postgres and haven't run `gbrain repair-jsonb` since v0.12.2, the v0.12.2 orchestrator still runs on upgrade. New `gbrain doctor` will tell you if anything still looks off. + +### Itemized changes + +**Sync deadlock fix (#132)** +- `src/commands/sync.ts` — remove outer `engine.transaction` wrap; per-file atomicity preserved by `importFromContent`'s own wrap. +- `test/sync.test.ts` — new regression guard asserting top-level `engine.transaction` is not called on > 10-file sync paths. +- Contributed by @sunnnybala. + +**postgres-engine statement_timeout scoping (#158)** +- `src/core/postgres-engine.ts` — `searchKeyword` and `searchVector` rewritten to `sql.begin(async (tx) => { await tx\`SET LOCAL statement_timeout = ...\`; ... })`. GUC dies with the transaction; pool reuse is safe. +- `test/postgres-engine.test.ts` — 5 regression tests including a source-level guardrail grep against the production file (not a test fixture) asserting no bare `SET statement_timeout` outside `sql.begin`. +- Contributed by @garagon. + +**Obsidian wikilinks + extended domain patterns (#187 slice)** +- `src/core/link-extraction.ts` — `extractEntityRefs` matches both `[Name](people/slug)` and `[[people/slug|Name]]`. `DIR_PATTERN` extended with `entities`, `projects`, `tech`, `finance`, `personal`, `openclaw`. +- Matches existing filesystem-walker behavior. +- Contributed by @knee5. + +**`gbrain orphans` command (#187 slice)** +- `src/commands/orphans.ts` — new command with text/JSON/count outputs and domain grouping. +- `src/core/operations.ts` — `find_orphans` MCP operation. +- `src/cli.ts` — `orphans` added to `CLI_ONLY`. +- `test/orphans.test.ts` — 203 lines covering detection, filters, and all output modes. +- Contributed by @knee5. + +**`tryParseEmbedding()` availability helper** +- `src/core/utils.ts` — new `tryParseEmbedding(value)`: returns `null` on unknown input, warns once per process via a module-level flag. +- `src/core/postgres-engine.ts` — `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one bad row degrades ranking instead of killing the query. +- `test/utils.test.ts` — new cases for null-return and single-warn. +- Hand-authored; codifies the split-by-call-site rule from the #97/#175 review. + +**Doctor detection checks** +- `src/commands/doctor.ts` — `jsonb_integrity` scans `pages.frontmatter`, `raw_data.data`, `ingest_log.pages_updated`, `files.metadata` and reports `jsonb_typeof='string'` counts; `markdown_body_completeness` heuristic for ≥30% shrinkage vs raw source on multi-H2 pages. +- `test/doctor.test.ts` — detection unit tests assert both checks exist and cover the four JSONB sites. +- `test/e2e/jsonb-roundtrip.test.ts` — the regression test that should have caught the original v0.12.0 double-encode bug; round-trips all four JSONB write sites against real Postgres. +- `docs/integrations/reliability-repair.md` — guide for v0.12.0 users: detect via `gbrain doctor`, repair via `gbrain repair-jsonb`. + +**No schema changes. No migration. No data touch.** + +## [0.12.2] - 2026-04-19 + +## **Postgres frontmatter queries actually work now.** +## **Wiki articles stop disappearing when you import them.** + +This is a data-correctness hotfix for the `v0.12.0`-and-earlier Postgres-backed brains. If you run gbrain on Postgres or Supabase, you've been losing data without knowing it. PGLite users were unaffected. Upgrade auto-repairs your existing rows. Lands on top of v0.12.1 (extract N+1 fix + migration timeout fix) — pull `gbrain upgrade` and you get both. + +### What was broken + +**Frontmatter columns were silently stored as quoted strings, not JSON.** Every `put_page` wrote `frontmatter` to Postgres via `${JSON.stringify(value)}::jsonb` — postgres.js v3 stringified again on the wire, so the column ended up holding `"\"{\\\"author\\\":\\\"garry\\\"}\""` instead of `{"author":"garry"}`. Every `frontmatter->>'key'` query returned NULL. GIN indexes on JSONB were inert. Same bug on `raw_data.data`, `ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`. PGLite hid this entirely (different driver path) — which is exactly why it slipped past the existing test suite. + +**Wiki articles got truncated by 83% on import.** `splitBody` treated *any* standalone `---` line in body content as a timeline separator. Discovered by @knee5 migrating a 1,991-article wiki where a 23,887-byte article landed in the DB as 593 bytes (4,856 of 6,680 wikilinks lost). + +**`/wiki/` subdirectories silently typed as `concept`.** Articles under `/wiki/analysis/`, `/wiki/guides/`, `/wiki/hardware/`, `/wiki/architecture/`, and `/writing/` defaulted to `type='concept'` — type-filtered queries lost everything in those buckets. + +**pgvector embeddings sometimes returned as strings → NaN search scores.** Discovered by @leonardsellem on Supabase, where `getEmbeddingsByChunkIds` returned `"[0.1,0.2,…]"` instead of `Float32Array`, producing `[NaN]` query scores. + +### What you can do now that you couldn't before + +- **`frontmatter->>'author'` returns `garry`, not NULL.** GIN indexes work. Postgres queries by frontmatter key actually retrieve pages. +- **Wiki articles round-trip intact.** Markdown horizontal rules in body text are horizontal rules, not timeline separators. +- **Recover already-truncated pages with `gbrain sync --full`.** Re-import from your source-of-truth markdown rebuilds `compiled_truth` correctly. +- **Search scores stop going `NaN` on Supabase.** Cosine rescoring sees real `Float32Array` embeddings. +- **Type-filtered queries find your wiki articles.** `/wiki/analysis/` becomes type `analysis`, `/writing/` becomes `writing`, etc. + +### How to upgrade + +```bash +gbrain upgrade +``` + +The `v0.12.2` orchestrator runs automatically: applies any schema changes, then `gbrain repair-jsonb` rewrites every double-encoded row in place using `jsonb_typeof = 'string'` as the guard. Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly. Batches well on large brains. + +If you want to recover pages that were truncated by the splitBody bug: + +```bash +gbrain sync --full +``` + +That re-imports every page from disk, so the new `splitBody` rebuilds the full `compiled_truth` correctly. + +### What's new under the hood + +- **`gbrain repair-jsonb`** — standalone command for the JSONB fix. Run it manually if needed; the migration runs it automatically. `--dry-run` shows what would be repaired without touching data. `--json` for scripting. +- **CI grep guard** at `scripts/check-jsonb-pattern.sh` — fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern. Wired into `bun test` so it runs on every CI invocation. +- **New E2E regression test** at `test/e2e/postgres-jsonb.test.ts` — round-trips all four JSONB write sites against real Postgres and asserts `jsonb_typeof = 'object'` plus `->>` returns the expected scalar. The test that should have caught the original bug. +- **Wikilink extraction** — `[[page]]` and `[[page|Display Text]]` syntaxes now extracted alongside standard `[text](page.md)` markdown links. Includes ancestor-search resolution for wiki KBs where authors omit one or more leading `../`. + +### Migration scope + +The repair touches five JSONB columns: +- `pages.frontmatter` +- `raw_data.data` +- `ingest_log.pages_updated` +- `files.metadata` +- `page_versions.frontmatter` (downstream of `pages.frontmatter` via INSERT...SELECT) + +Other JSONB columns in the schema (`minion_jobs.{data,result,progress,stacktrace}`, `minion_inbox.payload`) were always written via the parameterized `$N::jsonb` form so they were never affected. + +### Behavior changes (read this if you upgrade) + +`splitBody` now requires an explicit sentinel for timeline content. Recognized markers (in priority order): +1. `` (preferred — what `serializeMarkdown` emits) +2. `--- timeline ---` (decorated separator) +3. `---` directly before `## Timeline` or `## History` heading (backward-compat fallback) + +If you intentionally used a plain `---` to mark your timeline section in source markdown, add `` above it manually. The fallback covers the common case (`---` followed by `## Timeline`). + +### Attribution + +Built from community PRs #187 (@knee5) and #175 (@leonardsellem). The original PRs reported the bugs and proposed the fixes; this release re-implements them on top of the v0.12.0 knowledge graph release with expanded migration scope, schema audit (all 5 affected columns vs the 3 originally reported), engine-aware behavior, CI grep guard, and an E2E regression test that should have caught this in the first place. Codex outside-voice review during planning surfaced the missed `page_versions.frontmatter` propagation path and the noisy-truncated-diagnostic anti-pattern that was dropped from this scope. Thanks for finding the bugs and providing the recovery path — both PRs left work to do but the foundation was right. + +Co-Authored-By: @knee5 (PR #187 — splitBody, inferType wiki, JSONB triple-fix) +Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding, getEmbeddingsByChunkIds fix) + + + + +## [0.12.1] - 2026-04-19 + +## **Extract no longer hangs on large brains.** +## **v0.12.0 upgrade no longer times out on duplicates.** + +Two production-blocking bugs Garry hit on his 47K-page brain on April 18. `gbrain extract` was effectively unusable on any brain with 20K+ existing links or timeline entries — it pre-loaded the entire dedup set with one `getLinks()` call per page over the Supabase pooler, hanging for 10+ minutes producing zero output before any work started. The v0.12.0 schema migration that creates `idx_timeline_dedup` was failing on brains with pre-existing duplicate timeline rows because the `DELETE ... USING` self-join was O(n²) without an index, hitting Supabase Management API's 60-second ceiling on 80K+ duplicates. Both bugs end here. + +### The numbers that matter + +Measured on the new `test/extract-fs.test.ts` and `test/migrate.test.ts` regression suites, plus 73 E2E tests against real Postgres+pgvector. Reproducible: `bun test` + `bun run test:e2e`. + +| Metric | BEFORE v0.12.1 | AFTER v0.12.1 | Δ | +|-----------------------------------------|--------------------|--------------------|--------------------| +| extract hang on 47K-page brain | 10+ min, zero output | immediate work, ~30-60s wall clock | usable | +| DB round-trips per re-extract | 47K reads + 235K writes | 0 reads + ~2.4K writes | **~99% fewer** | +| v0.12.0 migration on 80K duplicate rows | timed out at 60s | completes <1s | **~60x+ faster** | +| Re-run on already-extracted brain | 235K row-writes | 0 row-writes | true no-op | +| Tests | 1297 unit / 105 E2E | **1412 unit / 119 E2E** | +115 unit / +14 E2E | +| `created` counter on re-runs | "5000 created" (lie) | "0 created" (truth)| accurate | + +Per-batch round-trip math: a re-extract on a 47K-page brain with ~5 links per page used to do 235K sequential round-trips over the Supabase pooler. With 100-row batched INSERTs it does ~2,400. The hang came from the read pre-load (47K serial `getLinks()` calls), which is now gone entirely. The DB enforces uniqueness via `ON CONFLICT DO NOTHING`. + +### What this means for GBrain users + +If you've been afraid to re-run `gbrain extract` because it might never finish, that's over. The command starts producing output immediately, batch-writes 100 rows per round-trip, and reports a truthful insert count even on re-runs. If your v0.12.0 upgrade got stuck on the timeline migration (or you had to manually run `CREATE TABLE ... AS SELECT DISTINCT ON ...` to unblock it), the next `gbrain init --migrate-only` is sub-second. Run `gbrain extract all` on your largest brain and watch it actually work. + +### Itemized changes + +#### Performance + +- **`gbrain extract` no longer pre-loads the dedup set.** Removed the N+1 read loop in `extractLinksFromDir`, `extractTimelineFromDir`, `extractLinksFromDB`, and `extractTimelineFromDB` that called `engine.getLinks(slug)` (or `getTimeline`) once per page across `engine.listPages({ limit: 100000 })`. On a 47K-page brain that was 47K serial network round-trips before the first file was even read. Both engines already enforced uniqueness at the SQL layer (`UNIQUE(from_page_id, to_page_id, link_type)` on `links`, `idx_timeline_dedup` on `timeline_entries`); the in-memory dedup `Set` was redundant insurance that turned into the bottleneck. +- **Batched multi-row INSERTs replace per-row writes.** All four extract paths now buffer 100 candidates and flush via new `addLinksBatch` / `addTimelineEntriesBatch` engine methods. Round-trips drop ~100x: ~235K → ~2,400 per full re-extract. Each batch uses `INSERT ... SELECT FROM unnest($1::text[], $2::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4 (links) or 5 (timeline) array-typed bound parameters regardless of batch size, sidestepping Postgres's 65535-parameter cap entirely. PGLite uses the same SQL shape with manual `$N` placeholders. + +#### Correctness + +- **`created` counter is now truthful on re-runs.** Returns count of rows actually inserted (via `RETURNING 1` row count), not "calls that didn't throw." A re-run on a fully-extracted brain prints `Done: 0 links, 0 timeline entries from 47000 pages`. Before this release it would print `Done: 5000 links` while inserting zero new rows. +- **`--dry-run` deduplicates candidates across files.** A link extracted from 3 different markdown files now prints exactly once in `--dry-run` output, matching what the batch insert would actually create. Before this release the dedup was tied to the now-deleted DB pre-load, so dry-run would over-print. +- **Whole-batch errors are visible in both JSON and human modes.** When a batch flush fails (DB connection drop, malformed row), the error prints to stderr in JSON mode AND to console in human mode, with the lost-row count. No more silent loss of 100 rows because of one bad row. + +#### Schema migrations — v0.12.0 upgrade is now sub-second on duplicate-heavy brains + +- **Migration v9 (timeline_entries) and v8 (links) pre-create a btree helper index** on the dedup columns before the `DELETE ... USING` self-join runs. Turns the O(n²) sequential-scan dedup into O(n log n) index-backed dedup. On 80K+ duplicate rows the migration completes in well under a second instead of timing out at 60s. The helper index is dropped after dedup, leaving the original schema unchanged. Same fix applied defensively to migration v8 — Garry's brain didn't trip it (links had fewer duplicates) but the same trap was loaded. +- **`phaseASchema` timeout in the v0.12.0 orchestrator bumped 60s → 600s.** Belt-and-suspenders: the helper-index fix should make dedup sub-second on most brains, but the outer wall-clock budget shouldn't be the failure mode for unforeseen slowness. + +#### New engine API + +- **`addLinksBatch(LinkBatchInput[]) → Promise`** and **`addTimelineEntriesBatch(TimelineBatchInput[]) → Promise`** on both `PostgresEngine` and `PGLiteEngine`. Returns count of actually-inserted rows (excluding ON CONFLICT no-ops and JOIN-dropped rows whose slugs don't exist). Per-row `addLink` / `addTimelineEntry` are unchanged — all 10 existing call sites compile and behave identically. Plugin authors building agent integrations on `BrainEngine` can adopt the batch methods at their own pace. + +#### Tests + +- **Migration regression tests guard the fix structurally + behaviorally.** New `test/migrate.test.ts` cases assert the v8 + v9 SQL literally contains the helper `CREATE INDEX IF NOT EXISTS ... DROP INDEX IF EXISTS` sequence in the right order (deterministic, fast, catches a regression even at 0-row scale where wall-clock can't distinguish O(n²) from O(1)) AND that the migration completes under wall-clock cap on 1000-row fixtures. +- **`test/extract-fs.test.ts` (new file)** covers the FS-source extract path end-to-end on PGLite: first-run inserts, second-run reports zero, dry-run dedups duplicate candidates across 3 files into one printed line, second-run perf regression guard. +- **9 new E2E tests for the postgres-engine batch methods** in `test/e2e/mechanical.test.ts`. The postgres-js bind path is structurally different from PGLite's (array params via `unnest()` vs manual `$N` placeholders) and gets its own coverage against real Postgres+pgvector. +- **11 new PGLite batch method tests** in `test/pglite-engine.test.ts` (empty batch, missing optionals normalize to empty strings, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch returns count of new only, batch of 100). + +#### Pre-ship review + +This release was reviewed by `/plan-eng-review` (5 issues, all addressed including a P0 plan reshape that dropped a redundant orchestrator phase in favor of fixing migration v9 directly), `/codex` outside-voice review on the plan (15 findings, all P1 + P2 incorporated — most consequential: forced a cleaner separation between per-row API stability and new batch APIs so all 10 existing `addLink` callers stay untouched), and 5 specialist subagents (testing, maintainability, performance, security, data-migration) at ship time. The testing specialist caught a real bug in the postgres-engine batch SQL: postgres-js's `sql(rows, ...)` helper doesn't compose with `(VALUES) AS v(...)` JOIN syntax the way originally written. Switched to the cleaner `unnest()` array-parameter pattern in both engines, verified end-to-end against a real Postgres+pgvector container. + +## [0.12.0] - 2026-04-18 + + + ## [Unreleased] +### Integrated from upstream GBrain + +Pulling forward security, data-correctness, and reliability fixes that landed in upstream GBrain (`garrytan/gbrain`) after our v0.1.0 fork point. This wave takes only the must-have fixes; large new feature layers (Minions agent orchestration, knowledge graph) are deferred to a separate Wave-2 evaluation. See individual section entries below for per-fix detail. + +- **Security — Wave 3 (9 vulnerabilities closed, from upstream #174).** `file_upload` arbitrary-file-read is closed, recipe trust boundary is real, string health_checks are blocked for untrusted recipes, SSRF defense for HTTP health_checks, prompt-injection hardening for query expansion, and `list_pages`/`get_ingest_log` actually cap now. Original fixes contributed by @garagon (#105-#109) and @Hybirdss (#139). See the historical `[0.10.2]` entry below for the full breakdown. +- **Migrations runner infrastructure (subset of upstream #130).** Adds `pbrain apply-migrations` and the `src/commands/migrations/` framework. The runner framework is in place; the actual orchestrators (Minions adoption, knowledge-graph auto-wire) are deferred to Wave-2. Registry begins empty and is populated by the JSONB repair entry below. +- **Data correctness — JSONB double-encode + splitBody + parseEmbedding (from upstream #196).** Fixes the `${JSON.stringify(x)}::jsonb` interpolation bug that silently stored Postgres JSONB columns as quoted strings (broke every `frontmatter->>'key'` query on Postgres-backed brains — PGLite was unaffected). Fixes the `splitBody` greedy `---` match that truncated wiki articles by up to 83%. Fixes `parseEmbedding` returning strings instead of `Float32Array` on Supabase, yielding NaN search scores. Adds `pbrain repair-jsonb`, the `scripts/check-jsonb-pattern.sh` CI grep guard, and an E2E regression test. Original fixes contributed by @knee5 (#187) and @leonardsellem (#175). See the historical `[0.12.2]` entry below for the full breakdown. +- **Perf — extract N+1 hang fix (from upstream #198).** New `addLinksBatch` and `addTimelineEntriesBatch` engine methods that use a single `INSERT ... SELECT FROM unnest(...) ... ON CONFLICT DO NOTHING RETURNING 1` query regardless of batch size. File-source `pbrain extract` now flushes candidates 100 at a time instead of issuing one write per link/entry. Mirrors the same pattern across PGLite and Postgres engines. Original fix was bundled with the Minions work by upstream; here it's isolated to the batch-insert API surface so it stands independent of the knowledge-graph layer. See the historical `[0.12.1]` entry below for the full breakdown. +- **Reliability wave (from upstream #216).** Sync deadlock fix on PGLite's non-reentrant transaction mutex (10+ files hung; now cleanly processes bulk syncs). `statement_timeout` scoped to the search transaction via `sql.begin` + `SET LOCAL` so it can't leak onto pooled connections and clip unrelated `embed --all` jobs. `tryParseEmbedding` in search/rescore paths skips+warns on one corrupt row instead of killing the query. New `pbrain orphans` command (and `find_orphans` MCP op) for content-enrichment cycles. Two new `doctor` checks (`jsonb_integrity`, `markdown_body_completeness`) surface v0.12.0-era residual data issues with actionable fix hints. Contributed by @sunnnybala and @garagon (upstream community). See the historical `[0.12.3]` entry above for the full breakdown. + +## [0.12.2] - 2026-04-19 + +## **Postgres frontmatter queries actually work now.** +## **Wiki articles stop disappearing when you import them.** + +This is a data-correctness hotfix for the `v0.12.0`-and-earlier Postgres-backed brains. If you run pbrain on Postgres or Supabase, you've been losing data without knowing it. PGLite users were unaffected. Upgrade auto-repairs your existing rows. Lands on top of v0.12.1 (extract N+1 fix + migration timeout fix) — pull `pbrain upgrade` and you get both. + +### What was broken + +**Frontmatter columns were silently stored as quoted strings, not JSON.** Every `put_page` wrote `frontmatter` to Postgres via `${JSON.stringify(value)}::jsonb` — postgres.js v3 stringified again on the wire, so the column ended up holding `"\"{\\\"author\\\":\\\"garry\\\"}\""` instead of `{"author":"garry"}`. Every `frontmatter->>'key'` query returned NULL. GIN indexes on JSONB were inert. Same bug on `raw_data.data`, `ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`. PGLite hid this entirely (different driver path) — which is exactly why it slipped past the existing test suite. + +**Wiki articles got truncated by 83% on import.** `splitBody` treated *any* standalone `---` line in body content as a timeline separator. Discovered by @knee5 migrating a 1,991-article wiki where a 23,887-byte article landed in the DB as 593 bytes (4,856 of 6,680 wikilinks lost). + +**`/wiki/` subdirectories silently typed as `concept`.** Articles under `/wiki/analysis/`, `/wiki/guides/`, `/wiki/hardware/`, `/wiki/architecture/`, and `/writing/` defaulted to `type='concept'` — type-filtered queries lost everything in those buckets. + +**pgvector embeddings sometimes returned as strings → NaN search scores.** Discovered by @leonardsellem on Supabase, where `getEmbeddingsByChunkIds` returned `"[0.1,0.2,…]"` instead of `Float32Array`, producing `[NaN]` query scores. + +### What you can do now that you couldn't before + +- **`frontmatter->>'author'` returns `garry`, not NULL.** GIN indexes work. Postgres queries by frontmatter key actually retrieve pages. +- **Wiki articles round-trip intact.** Markdown horizontal rules in body text are horizontal rules, not timeline separators. +- **Recover already-truncated pages with `pbrain sync --full`.** Re-import from your source-of-truth markdown rebuilds `compiled_truth` correctly. +- **Search scores stop going `NaN` on Supabase.** Cosine rescoring sees real `Float32Array` embeddings. +- **Type-filtered queries find your wiki articles.** `/wiki/analysis/` becomes type `analysis`, `/writing/` becomes `writing`, etc. + +### How to upgrade + +```bash +pbrain upgrade +``` + +The `v0.12.2` orchestrator runs automatically: applies any schema changes, then `pbrain repair-jsonb` rewrites every double-encoded row in place using `jsonb_typeof = 'string'` as the guard. Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly. Batches well on large brains. + +If you want to recover pages that were truncated by the splitBody bug: + +```bash +pbrain sync --full +``` + +That re-imports every page from disk, so the new `splitBody` rebuilds the full `compiled_truth` correctly. + +### What's new under the hood + +- **`pbrain repair-jsonb`** — standalone command for the JSONB fix. Run it manually if needed; the migration runs it automatically. `--dry-run` shows what would be repaired without touching data. `--json` for scripting. +- **CI grep guard** at `scripts/check-jsonb-pattern.sh` — fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern. Wired into `bun test` so it runs on every CI invocation. +- **New E2E regression test** at `test/e2e/postgres-jsonb.test.ts` — round-trips all four JSONB write sites against real Postgres and asserts `jsonb_typeof = 'object'` plus `->>` returns the expected scalar. The test that should have caught the original bug. +- **Wikilink extraction** — `[[page]]` and `[[page|Display Text]]` syntaxes now extracted alongside standard `[text](page.md)` markdown links. Includes ancestor-search resolution for wiki KBs where authors omit one or more leading `../`. + +### Migration scope + +The repair touches five JSONB columns: +- `pages.frontmatter` +- `raw_data.data` +- `ingest_log.pages_updated` +- `files.metadata` +- `page_versions.frontmatter` (downstream of `pages.frontmatter` via INSERT...SELECT) + +Other JSONB columns in the schema (`minion_jobs.{data,result,progress,stacktrace}`, `minion_inbox.payload`) were always written via the parameterized `$N::jsonb` form so they were never affected. + +### Behavior changes (read this if you upgrade) + +`splitBody` now requires an explicit sentinel for timeline content. Recognized markers (in priority order): +1. `` (preferred — what `serializeMarkdown` emits) +2. `--- timeline ---` (decorated separator) +3. `---` directly before `## Timeline` or `## History` heading (backward-compat fallback) + +If you intentionally used a plain `---` to mark your timeline section in source markdown, add `` above it manually. The fallback covers the common case (`---` followed by `## Timeline`). + +### Attribution + +Built from community PRs #187 (@knee5) and #175 (@leonardsellem). The original PRs reported the bugs and proposed the fixes; this release re-implements them on top of the v0.12.0 knowledge graph release with expanded migration scope, schema audit (all 5 affected columns vs the 3 originally reported), engine-aware behavior, CI grep guard, and an E2E regression test that should have caught this in the first place. Codex outside-voice review during planning surfaced the missed `page_versions.frontmatter` propagation path and the noisy-truncated-diagnostic anti-pattern that was dropped from this scope. Thanks for finding the bugs and providing the recovery path — both PRs left work to do but the foundation was right. + +Co-Authored-By: @knee5 (PR #187 — splitBody, inferType wiki, JSONB triple-fix) +Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding, getEmbeddingsByChunkIds fix) + +## [0.12.1] - 2026-04-19 + ## [0.1.0] - 2026-04-17 The first PBrain release. Adaptation work was phased across four PRs merged to master incrementally; this release tags the final state after all four phases plus the pre-tag polish wave below. @@ -128,6 +429,29 @@ Final phase — tags `v0.1.0` and cuts the first PBrain release. --- +## [0.10.2] - 2026-04-17 + +### Security — Wave 3 (9 vulnerabilities closed) + +This wave closes a high-severity arbitrary-file-read in `file_upload`, fixes a fake trust boundary that let any cwd-local recipe execute arbitrary commands, and lays down real SSRF defense for HTTP health checks. If you ran `pbrain` in a directory where someone could drop a `recipes/` folder, this matters. + +- **Arbitrary file read via `file_upload` is closed.** Remote (MCP) callers were able to read `/etc/passwd` or any other host file. Path validation now uses `realpathSync` + `path.relative` to catch symlinked-parent traversal, plus an allowlist regex for slugs and filenames (control chars, backslashes, RTL-override Unicode all rejected). Local CLI users still upload from anywhere — only remote callers are confined. Fixes Issue #139, contributed by @Hybirdss; original fix #105 by @garagon. +- **Recipe trust boundary is real now.** `loadAllRecipes()` previously marked every recipe as `embedded=true`, including ones from `./recipes/` in your cwd or `$PBRAIN_RECIPES_DIR`. Anyone who could drop a recipe in cwd could bypass every health-check gate. Now only package-bundled recipes (source install + global install) are trusted. Original fixes #106, #108 by @garagon. +- **String health_checks blocked for untrusted recipes.** Even with the recipe trust fix, the string health_check path ran `execSync` before reaching the typed-DSL switch — a malicious "embedded" recipe could `curl http://169.254.169.254/metadata` and exfiltrate cloud credentials. Non-embedded recipes are now hard-blocked from string health_checks; embedded recipes still get the `isUnsafeHealthCheck` defense-in-depth guard. +- **SSRF defense for HTTP health_checks.** New `isInternalUrl()` blocks loopback, RFC1918, link-local (incl. AWS metadata 169.254.169.254), CGNAT, IPv6 loopback, and IPv4-mapped IPv6 (`[::ffff:127.0.0.1]` canonicalized to hex hextets — both forms blocked). Bypass encodings handled: hex IPs (`0x7f000001`), octal (`0177.0.0.1`), single decimal (`2130706433`). Scheme allowlist rejects `file:`, `data:`, `blob:`, `ftp:`, `javascript:`. `fetch` runs with `redirect: 'manual'` and re-validates every Location header up to 3 hops. Original fix #108 by @garagon. +- **Prompt injection hardening for query expansion.** Restructured the LLM prompt with a system instruction that declares the query as untrusted data, plus an XML-tagged `` boundary. Layered with regex sanitization (strips code fences, tags, injection prefixes) and output-side validation on the model's `alternative_queries` array (cap length, strip control chars, dedup, drop empties). The `console.warn` on stripped content never logs the query text itself. Original fix #107 by @garagon. +- **`list_pages` and `get_ingest_log` actually cap now.** Wave 3 found that `clampSearchLimit(limit, default)` was always allowing up to 100 — the second arg was the default, not the cap. Added a third `cap` parameter so `list_pages` caps at 100 and `get_ingest_log` caps at 50. Internal bulk commands (embed --all, export, migrate-engine) bypass the operation layer entirely and remain uncapped. Original fix #109 by @garagon. + +### Added + +- `OperationContext.remote` flag distinguishes trusted local CLI callers from untrusted MCP callers. Security-sensitive operations (currently `file_upload`) tighten their behavior when `remote=true`. Defaults to strict (treat as remote) when unset. +- Exported security helpers for testing and reuse: `validateUploadPath`, `validatePageSlug`, `validateFilename`, `parseOctet`, `hostnameToOctets`, `isPrivateIpv4`, `isInternalUrl`, `getRecipeDirs`, `sanitizeQueryForPrompt`, `sanitizeExpansionOutput`. +- 49 new tests covering symlink traversal, scheme allowlist, IPv4 bypass forms, IPv6 mapped addresses, prompt injection patterns, and recipe trust boundaries. Plus an E2E regression proving remote callers can't escape cwd. + +### Contributors + +Wave 3 fixes were contributed by **@garagon** (PRs #105-#109) and **@Hybirdss** (Issue #139). The collector branch re-implemented each fix with additional hardening for the residuals Codex caught during outside-voice review (parent-symlink traversal, fake `isEmbedded` boundary, redirect-following SSRF, scheme bypasses, `clampSearchLimit` semantics). + ## [0.10.1] - 2026-04-15 ### Fixed diff --git a/CLAUDE.md b/CLAUDE.md index ae376d30..9dfb7004 100644 --- a/CLAUDE.md +++ b/CLAUDE.md @@ -9,20 +9,26 @@ cron scheduling, reports, identity, and access control. ## Architecture -Contract-first: `src/core/operations.ts` defines ~30 shared operations. CLI and MCP +Contract-first: `src/core/operations.ts` defines ~30 shared operations (adds `find_orphans` from the upstream v0.12.3 reliability wave). CLI and MCP server are both generated from this single source. Engine factory (`src/core/engine-factory.ts`) dynamically imports the configured engine (`'pglite'` or `'postgres'`). Skills are fat markdown files (tool-agnostic, work with the CLI and MCP server contexts). +**Trust boundary:** `OperationContext.remote` distinguishes trusted local CLI callers +(`remote: false` set by `src/cli.ts`) from untrusted agent-facing callers +(`remote: true` set by `src/mcp/server.ts`). Security-sensitive operations like +`file_upload` tighten filesystem confinement when `remote=true` and default to +strict behavior when unset. + ## Key files -- `src/core/operations.ts` — Contract-first operation definitions (the foundation) -- `src/core/engine.ts` — Pluggable engine interface (BrainEngine) +- `src/core/operations.ts` — Contract-first operation definitions (the foundation). Also exports upload validators: `validateUploadPath`, `validatePageSlug`, `validateFilename`. `OperationContext.remote` flags untrusted callers. +- `src/core/engine.ts` — Pluggable engine interface (BrainEngine). `clampSearchLimit(limit, default, cap)` takes an explicit cap so per-operation caps can be tighter than `MAX_SEARCH_LIMIT`. Exports `LinkBatchInput` / `TimelineBatchInput` for the v0.12.1 bulk-insert API (`addLinksBatch` / `addTimelineEntriesBatch`). - `src/core/engine-factory.ts` — Engine factory with dynamic imports (`'pglite'` | `'postgres'`) -- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation, all 37 BrainEngine methods +- `src/core/pglite-engine.ts` — PGLite (embedded Postgres 17.5 via WASM) implementation. `addLinksBatch` / `addTimelineEntriesBatch` use multi-row `unnest()` with manual `$N` placeholders. - `src/core/pglite-schema.ts` — PGLite-specific DDL (pgvector, pg_trgm, triggers) -- `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted) -- `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts +- `src/core/postgres-engine.ts` — Postgres + pgvector implementation (Supabase / self-hosted). `addLinksBatch` / `addTimelineEntriesBatch` use `INSERT ... SELECT FROM unnest($1::text[], ...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` — 4-5 array params regardless of batch size, sidesteps the 65535-parameter cap. As of v0.12.3, `searchKeyword` / `searchVector` scope `statement_timeout` via `sql.begin` + `SET LOCAL` so the GUC dies with the transaction instead of leaking across the pooled postgres.js connection (contributed by @garagon). `getEmbeddingsByChunkIds` uses `tryParseEmbedding` so one corrupt row skips+warns instead of killing the query. +- `src/core/utils.ts` — Shared SQL utilities extracted from postgres-engine.ts. Exports `parseEmbedding(value)` (throws on unknown input, used by migration + ingest paths where data integrity matters) and as of v0.12.3 `tryParseEmbedding(value)` (returns `null` + warns once per process, used by search/rescore paths where availability matters more than strictness). - `src/core/db.ts` — Connection management, schema initialization - `src/commands/migrate-engine.ts` — Bidirectional engine migration (`pbrain migrate --to supabase/pglite`) - `src/core/import-file.ts` — importFromFile + importFromContent (chunk + embed + tags) @@ -42,15 +48,23 @@ markdown files (tool-agnostic, work with the CLI and MCP server contexts). - `src/core/transcription.ts` — Audio transcription: Groq Whisper (default), OpenAI fallback, ffmpeg segmentation for >25MB - `src/core/enrichment-service.ts` — Global enrichment service: entity slug generation, tier auto-escalation, batch throttling - `src/core/data-research.ts` — Recipe validation, field extraction (MRR/ARR regex), dedup, tracker parsing, HTML stripping -- `src/commands/extract.ts` — `pbrain extract links|timeline|all`: batch link/timeline extraction from markdown +- `src/commands/extract.ts` — `pbrain extract links|timeline|all`: batch link/timeline extraction from markdown files. As of the v0.12.1 N+1 fix, candidates are buffered 100 at a time and flushed via `addLinksBatch` / `addTimelineEntriesBatch`; `ON CONFLICT DO NOTHING` enforces uniqueness at the DB layer, and the `created` counter returns real rows inserted (truthful on re-runs). The DB-source extractor (`--source db`) remains deferred with the knowledge-graph layer. - `src/commands/features.ts` — `pbrain features --json --auto-fix`: usage scan + feature adoption salesman - `src/commands/autopilot.ts` — `pbrain autopilot --install`: self-maintaining brain daemon (sync+extract+embed) - `src/mcp/server.ts` — MCP stdio server (generated from operations) - `src/commands/auth.ts` — Standalone token management (create/list/revoke/test) - `src/commands/upgrade.ts` — Self-update CLI with post-upgrade feature discovery + features hook +- `src/commands/apply-migrations.ts` — `pbrain apply-migrations [--list] [--dry-run] [--migration vX.Y.Z]`: runs pending migration orchestrators from the TS registry. +- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). All orchestrators are idempotent and resumable from `partial` status. Upstream's v0.11.0 (Minions) and v0.12.0 (knowledge-graph) orchestrators are intentionally NOT registered in this fork. +- `src/commands/repair-jsonb.ts` — `pbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent. +- `src/commands/orphans.ts` — `pbrain orphans [--json] [--count] [--include-pseudo]`: surfaces pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. Integrated from upstream's v0.12.3 reliability wave (contributed by @knee5). +- `src/commands/doctor.ts` — `pbrain doctor [--json] [--fast] [--fix]`: health checks. v0.12.3 adds two reliability detection checks: `jsonb_integrity` (scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for `jsonb_typeof='string'` rows left over from v0.12.0) and `markdown_body_completeness` (flags pages whose compiled_truth is <30% of raw source when raw has multiple H2/H3 boundaries). Fix hints point at `pbrain repair-jsonb` and `pbrain sync --force`. +- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (``, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics). +- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`. - `src/core/schema-embedded.ts` — AUTO-GENERATED from schema.sql (run `bun run build:schema`) - `src/schema.sql` — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts) -- `src/commands/integrations.ts` — Standalone integration recipe management (no DB needed) +- `src/commands/integrations.ts` — Standalone integration recipe management (no DB needed). Exports `getRecipeDirs()` (trust-tagged recipe sources), SSRF helpers (`isInternalUrl`, `parseOctet`, `hostnameToOctets`, `isPrivateIpv4`). Only package-bundled recipes are `embedded=true`; `$PBRAIN_RECIPES_DIR` and cwd `./recipes/` are untrusted and cannot run `command`/`http`/string health checks. +- `src/core/search/expansion.ts` — Multi-query expansion via Haiku. Exports `sanitizeQueryForPrompt` + `sanitizeExpansionOutput` (prompt-injection defense-in-depth). Sanitized query is only used for the LLM channel; original query still drives search. - `recipes/` — Integration recipe files (YAML frontmatter + markdown setup instructions) - `docs/guides/` — Individual SKILLPACK guides (broken out from monolith) - `docs/integrations/` — "Getting Data In" guides and integration docs @@ -103,23 +117,30 @@ Key commands added in v0.7: - `pbrain init` — defaults to PGLite (no Supabase needed), scans repo size, suggests Supabase for 1000+ files - `pbrain migrate --to supabase` / `pbrain migrate --to pglite` — bidirectional engine migration +Key commands added in v0.12.2: +- `gbrain repair-jsonb [--dry-run] [--json]` — repair double-encoded JSONB rows left over from v0.12.0-and-earlier Postgres writes. Idempotent; PGLite no-ops. The `v0_12_2` migration runs this automatically on `gbrain upgrade`. + +Key commands added in v0.12.3: +- `gbrain orphans [--json] [--count] [--include-pseudo]` — surface pages with zero inbound wikilinks, grouped by domain. Auto-generated/raw/pseudo pages filtered by default. Also exposed as `find_orphans` MCP operation. The natural consumer of the v0.12.0 knowledge graph layer: once edges are captured, find the gaps. +- `gbrain doctor` gains two new reliability detection checks: `jsonb_integrity` (v0.12.0 Postgres double-encode damage) and `markdown_body_completeness` (pages truncated by the old splitBody bug). Detection only; fix hints point at `gbrain repair-jsonb` and `gbrain sync --force`. + ## Testing -`bun test` runs all tests (34 unit test files + 5 E2E test files). Unit tests run +`bun test` runs all tests. Unit tests run without a database. E2E tests skip gracefully when `DATABASE_URL` is not set. Unit tests: `test/markdown.test.ts` (frontmatter parsing), `test/chunkers/recursive.test.ts` -(chunking), `test/sync.test.ts` (sync logic), `test/parity.test.ts` (operations contract +(chunking), `test/parity.test.ts` (operations contract parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redaction), `test/files.test.ts` (MIME/hash), `test/import-file.test.ts` (import pipeline), -`test/upgrade.test.ts` (schema migrations), `test/doctor.test.ts` (doctor command), +`test/upgrade.test.ts` (schema migrations), `test/file-migration.test.ts` (file migration), `test/file-resolver.test.ts` (file resolution), -`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration), +`test/import-resume.test.ts` (import checkpoints), `test/migrate.test.ts` (migration; v8/v9 helper-btree-index SQL structural assertions + 1000-row wall-clock fixtures that guard the O(n²)→O(n log n) fix), `test/setup-branching.test.ts` (setup flow), `test/slug-validation.test.ts` (slug validation), `test/storage.test.ts` (storage backends), `test/supabase-admin.test.ts` (Supabase admin), `test/yaml-lite.test.ts` (YAML parsing), `test/check-update.test.ts` (version check + update CLI), -`test/pglite-engine.test.ts` (PGLite engine, all 37 BrainEngine methods), -`test/utils.test.ts` (shared SQL utilities), `test/engine-factory.test.ts` (engine factory + dynamic imports), +`test/pglite-engine.test.ts` (PGLite engine, all 40 BrainEngine methods including 11 cases for `addLinksBatch` / `addTimelineEntriesBatch`: empty batch, missing optionals, within-batch dedup via ON CONFLICT, missing-slug rows dropped by JOIN, half-existing batch, batch of 100), +`test/engine-factory.test.ts` (engine factory + dynamic imports), `test/integrations.test.ts` (recipe parsing, CLI routing, recipe validation), `test/publish.test.ts` (content stripping, encryption, password generation, HTML output), `test/backlinks.test.ts` (entity extraction, back-link detection, timeline entry generation), @@ -138,11 +159,25 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac `test/enrichment-service.test.ts` (entity slugification, extraction, tier escalation), `test/data-research.test.ts` (recipe validation, MRR/ARR extraction, dedup, tracker parsing, HTML stripping), `test/extract.test.ts` (link extraction, timeline extraction, frontmatter parsing, directory type inference), -`test/features.test.ts` (feature scanning, brain_score calculation, CLI routing, persistence). +`test/extract-fs.test.ts` (pbrain extract: first-run inserts + second-run reports zero, dry-run dedups candidates across files, second-run perf regression guard — the v0.12.1 N+1 dedup bug), +`test/features.test.ts` (feature scanning, brain_score calculation, CLI routing, persistence), +`test/file-upload-security.test.ts` (symlink traversal, cwd confinement, slug + filename allowlists, remote vs local trust), +`test/query-sanitization.test.ts` (prompt-injection stripping, output sanitization, structural boundary), +`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log), +`test/repair-jsonb.test.ts` (v0.12.2 JSONB repair: TARGETS list, idempotency, engine-awareness), +`test/migrations-v0_12_2.test.ts` (v0.12.2 orchestrator phases: schema → repair → verify → record), +`test/markdown.test.ts` (splitBody sentinel precedence, horizontal-rule preservation, inferType wiki subtypes), +`test/orphans.test.ts` (v0.12.3 orphans command: detection, pseudo filtering, text/json/count outputs, MCP op), +`test/postgres-engine.test.ts` (v0.12.3 statement_timeout scoping: `sql.begin` + `SET LOCAL` shape, source-level grep guardrail against reintroduced bare `SET statement_timeout`), +`test/sync.test.ts` (sync logic + v0.12.3 regression guard asserting top-level `engine.transaction` is not called), +`test/doctor.test.ts` (doctor command + v0.12.3 assertions that `jsonb_integrity` scans the four v0.12.0 write sites and `markdown_body_completeness` is present), +`test/utils.test.ts` (shared SQL utilities + `tryParseEmbedding` null-return and single-warn semantics). E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`. -- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys) +- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys). Includes 9 dedicated cases for the postgres-engine `addLinksBatch` / `addTimelineEntriesBatch` bind path — postgres-js's `unnest()` binding is structurally different from PGLite's and gets its own coverage. - `test/e2e/search-quality.test.ts` runs search quality E2E against PGLite (no API keys, in-memory) +- `test/e2e/postgres-jsonb.test.ts` — v0.12.2 regression test. Round-trips all 5 JSONB write sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter) against real Postgres and asserts `jsonb_typeof='object'` plus `->>'key'` returns the expected scalar. The test that should have caught the original double-encode bug. +- `test/e2e/jsonb-roundtrip.test.ts` — v0.12.3 companion regression against the 4 doctor-scanned JSONB sites. Assertion-level overlap with `postgres-jsonb.test.ts` is intentional defense-in-depth: if doctor's scan surface ever drifts from the actual write surface, one of these tests catches it. - `test/e2e/upgrade.test.ts` runs check-update E2E against real GitHub API (network required) - Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI - If `.env.testing` doesn't exist in this directory, check sibling worktrees for one: diff --git a/INSTALL_FOR_AGENTS.md b/INSTALL_FOR_AGENTS.md index 866839ea..4968f4f8 100644 --- a/INSTALL_FOR_AGENTS.md +++ b/INSTALL_FOR_AGENTS.md @@ -175,3 +175,14 @@ pbrain install-skills # refresh IDE skill symlinks if Claude Code / Curs Re-running `pbrain init` with no flags on an already-configured machine reuses the `brain_path` saved in `~/.pbrain/config.json` — no re-prompt. + +Then read `~/pbrain/skills/migrations/v.md` (and any intermediate +versions you skipped) and run any backfill or verification steps it lists. Skipping +this is how features ship in the binary but stay dormant in the user's brain. + +For the v0.12.2 JSONB fix specifically: if your brain is Postgres- or Supabase-backed +and was written against a v0.12.0-or-earlier schema, run `pbrain apply-migrations` +(or it auto-runs during `pbrain upgrade`) to invoke `pbrain repair-jsonb`, which +rewrites every double-encoded JSONB row in place. PGLite brains no-op. If wiki-style +imports were truncated by the old `splitBody` bug, run `pbrain sync --full` after +upgrading to rebuild `compiled_truth` from source markdown. diff --git a/README.md b/README.md index e02a9b0e..5550324b 100644 --- a/README.md +++ b/README.md @@ -422,6 +422,9 @@ ADMIN pbrain integrations Integration recipe dashboard pbrain check-backlinks check|fix Back-link enforcement pbrain lint [--fix] LLM artifact detection + pbrain apply-migrations [--list] Run pending migration orchestrators + pbrain repair-jsonb [--dry-run] Repair v0.12.0 double-encoded JSONB (Postgres) + pbrain orphans [--json] [--count] Find pages with zero inbound wikilinks pbrain transcribe