Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
75 changes: 75 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,81 @@

All notable changes to GBrain will be documented in this file.

## [0.12.2] - 2026-04-19

## **Postgres frontmatter queries actually work now.**
## **Wiki articles stop disappearing when you import them.**

This is a data-correctness hotfix for the `v0.12.0`-and-earlier Postgres-backed brains. If you run gbrain on Postgres or Supabase, you've been losing data without knowing it. PGLite users were unaffected. Upgrade auto-repairs your existing rows. Lands on top of v0.12.1 (extract N+1 fix + migration timeout fix) — pull `gbrain upgrade` and you get both.

### What was broken

**Frontmatter columns were silently stored as quoted strings, not JSON.** Every `put_page` wrote `frontmatter` to Postgres via `${JSON.stringify(value)}::jsonb` — postgres.js v3 stringified again on the wire, so the column ended up holding `"\"{\\\"author\\\":\\\"garry\\\"}\""` instead of `{"author":"garry"}`. Every `frontmatter->>'key'` query returned NULL. GIN indexes on JSONB were inert. Same bug on `raw_data.data`, `ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`. PGLite hid this entirely (different driver path) — which is exactly why it slipped past the existing test suite.

**Wiki articles got truncated by 83% on import.** `splitBody` treated *any* standalone `---` line in body content as a timeline separator. Discovered by @knee5 migrating a 1,991-article wiki where a 23,887-byte article landed in the DB as 593 bytes (4,856 of 6,680 wikilinks lost).

**`/wiki/` subdirectories silently typed as `concept`.** Articles under `/wiki/analysis/`, `/wiki/guides/`, `/wiki/hardware/`, `/wiki/architecture/`, and `/writing/` defaulted to `type='concept'` — type-filtered queries lost everything in those buckets.

**pgvector embeddings sometimes returned as strings → NaN search scores.** Discovered by @leonardsellem on Supabase, where `getEmbeddingsByChunkIds` returned `"[0.1,0.2,…]"` instead of `Float32Array`, producing `[NaN]` query scores.

### What you can do now that you couldn't before

- **`frontmatter->>'author'` returns `garry`, not NULL.** GIN indexes work. Postgres queries by frontmatter key actually retrieve pages.
- **Wiki articles round-trip intact.** Markdown horizontal rules in body text are horizontal rules, not timeline separators.
- **Recover already-truncated pages with `gbrain sync --full`.** Re-import from your source-of-truth markdown rebuilds `compiled_truth` correctly.
- **Search scores stop going `NaN` on Supabase.** Cosine rescoring sees real `Float32Array` embeddings.
- **Type-filtered queries find your wiki articles.** `/wiki/analysis/` becomes type `analysis`, `/writing/` becomes `writing`, etc.

### How to upgrade

```bash
gbrain upgrade
```

The `v0.12.2` orchestrator runs automatically: applies any schema changes, then `gbrain repair-jsonb` rewrites every double-encoded row in place using `jsonb_typeof = 'string'` as the guard. Idempotent — re-running is a no-op. PGLite engines short-circuit cleanly. Batches well on large brains.

If you want to recover pages that were truncated by the splitBody bug:

```bash
gbrain sync --full
```

That re-imports every page from disk, so the new `splitBody` rebuilds the full `compiled_truth` correctly.

### What's new under the hood

- **`gbrain repair-jsonb`** — standalone command for the JSONB fix. Run it manually if needed; the migration runs it automatically. `--dry-run` shows what would be repaired without touching data. `--json` for scripting.
- **CI grep guard** at `scripts/check-jsonb-pattern.sh` — fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern. Wired into `bun test` so it runs on every CI invocation.
- **New E2E regression test** at `test/e2e/postgres-jsonb.test.ts` — round-trips all four JSONB write sites against real Postgres and asserts `jsonb_typeof = 'object'` plus `->>` returns the expected scalar. The test that should have caught the original bug.
- **Wikilink extraction** — `[[page]]` and `[[page|Display Text]]` syntaxes now extracted alongside standard `[text](page.md)` markdown links. Includes ancestor-search resolution for wiki KBs where authors omit one or more leading `../`.

### Migration scope

The repair touches five JSONB columns:
- `pages.frontmatter`
- `raw_data.data`
- `ingest_log.pages_updated`
- `files.metadata`
- `page_versions.frontmatter` (downstream of `pages.frontmatter` via INSERT...SELECT)

Other JSONB columns in the schema (`minion_jobs.{data,result,progress,stacktrace}`, `minion_inbox.payload`) were always written via the parameterized `$N::jsonb` form so they were never affected.

### Behavior changes (read this if you upgrade)

`splitBody` now requires an explicit sentinel for timeline content. Recognized markers (in priority order):
1. `<!-- timeline -->` (preferred — what `serializeMarkdown` emits)
2. `--- timeline ---` (decorated separator)
3. `---` directly before `## Timeline` or `## History` heading (backward-compat fallback)

If you intentionally used a plain `---` to mark your timeline section in source markdown, add `<!-- timeline -->` above it manually. The fallback covers the common case (`---` followed by `## Timeline`).

### Attribution

Built from community PRs #187 (@knee5) and #175 (@leonardsellem). The original PRs reported the bugs and proposed the fixes; this release re-implements them on top of the v0.12.0 knowledge graph release with expanded migration scope, schema audit (all 5 affected columns vs the 3 originally reported), engine-aware behavior, CI grep guard, and an E2E regression test that should have caught this in the first place. Codex outside-voice review during planning surfaced the missed `page_versions.frontmatter` propagation path and the noisy-truncated-diagnostic anti-pattern that was dropped from this scope. Thanks for finding the bugs and providing the recovery path — both PRs left work to do but the foundation was right.

Co-Authored-By: @knee5 (PR #187 — splitBody, inferType wiki, JSONB triple-fix)
Co-Authored-By: @leonardsellem (PR #175 — parseEmbedding, getEmbeddingsByChunkIds fix)

## [0.12.1] - 2026-04-19

## **Extract no longer hangs on large brains.**
Expand Down
14 changes: 12 additions & 2 deletions CLAUDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,10 @@ strict behavior when unset.
- `src/mcp/server.ts` — MCP stdio server (generated from operations)
- `src/commands/auth.ts` — Standalone token management (create/list/revoke/test)
- `src/commands/upgrade.ts` — Self-update CLI. `runPostUpgrade()` enumerates migrations from the TS registry (src/commands/migrations/index.ts) and tail-calls `runApplyMigrations(['--yes', '--non-interactive'])` so the mechanical side of every outstanding migration runs unconditionally.
- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). All orchestrators are idempotent and resumable from `partial` status.
- `src/commands/migrations/` — TS migration registry (compiled into the binary; no filesystem walk of `skills/migrations/*.md` needed at runtime). `index.ts` lists migrations in semver order. `v0_11_0.ts` = Minions adoption orchestrator (8 phases). `v0_12_0.ts` = Knowledge Graph auto-wire orchestrator (5 phases: schema → config check → backfill links → backfill timeline → verify). `phaseASchema` has a 600s timeout (bumped from 60s in v0.12.1 for duplicate-heavy brains). `v0_12_2.ts` = JSONB double-encode repair orchestrator (4 phases: schema → repair-jsonb → verify → record). All orchestrators are idempotent and resumable from `partial` status.
- `src/commands/repair-jsonb.ts` — `gbrain repair-jsonb [--dry-run] [--json]`: rewrites `jsonb_typeof='string'` rows in place across 5 affected columns (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter). Fixes v0.12.0 double-encode bug on Postgres; PGLite no-ops. Idempotent.
- `src/core/markdown.ts` — Frontmatter parsing + body splitter. `splitBody` requires an explicit timeline sentinel (`<!-- timeline -->`, `--- timeline ---`, or `---` immediately before `## Timeline`/`## History`). Plain `---` in body text is a markdown horizontal rule, not a separator. `inferType` auto-types `/wiki/analysis/` → analysis, `/wiki/guides/` → guide, `/wiki/hardware/` → hardware, `/wiki/architecture/` → architecture, `/writing/` → writing (plus the existing people/companies/deals/etc heuristics).
- `scripts/check-jsonb-pattern.sh` — CI grep guard. Fails the build if anyone reintroduces the `${JSON.stringify(x)}::jsonb` interpolation pattern (which postgres.js v3 double-encodes). Wired into `bun test`.
- `docs/UPGRADING_DOWNSTREAM_AGENTS.md` — Patches for downstream agent skill forks (Wintermute etc.) to apply when upgrading. Each release appends a new section. v0.10.3 includes diffs for brain-ops, meeting-ingestion, signal-detector, enrich.
- `src/core/schema-embedded.ts` — AUTO-GENERATED from schema.sql (run `bun run build:schema`)
- `src/schema.sql` — Full Postgres + pgvector DDL (source of truth, generates schema-embedded.ts)
Expand Down Expand Up @@ -129,6 +132,9 @@ Key commands added for Minions (job queue):
- `gbrain jobs stats` — job health dashboard
- `gbrain jobs work [--queue Q] [--concurrency N]` — start worker daemon (Postgres only)

Key commands added in v0.12.2:
- `gbrain repair-jsonb [--dry-run] [--json]` — repair double-encoded JSONB rows left over from v0.12.0-and-earlier Postgres writes. Idempotent; PGLite no-ops. The `v0_12_2` migration runs this automatically on `gbrain upgrade`.

## Testing

`bun test` runs all tests. After the v0.12.1 release: ~75 unit test files + 8 E2E test files (1412 unit pass, 119 E2E when `DATABASE_URL` is set — skip gracefully otherwise). Unit tests run
Expand Down Expand Up @@ -172,12 +178,16 @@ parity), `test/cli.test.ts` (CLI structure), `test/config.test.ts` (config redac
`test/features.test.ts` (feature scanning, brain_score calculation, CLI routing, persistence),
`test/file-upload-security.test.ts` (symlink traversal, cwd confinement, slug + filename allowlists, remote vs local trust),
`test/query-sanitization.test.ts` (prompt-injection stripping, output sanitization, structural boundary),
`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log).
`test/search-limit.test.ts` (clampSearchLimit default/cap behavior across list_pages and get_ingest_log),
`test/repair-jsonb.test.ts` (v0.12.2 JSONB repair: TARGETS list, idempotency, engine-awareness),
`test/migrations-v0_12_2.test.ts` (v0.12.2 orchestrator phases: schema → repair → verify → record),
`test/markdown.test.ts` (splitBody sentinel precedence, horizontal-rule preservation, inferType wiki subtypes).

E2E tests (`test/e2e/`): Run against real Postgres+pgvector. Require `DATABASE_URL`.
- `bun run test:e2e` runs Tier 1 (mechanical, all operations, no API keys). Includes 9 dedicated cases for the postgres-engine `addLinksBatch` / `addTimelineEntriesBatch` bind path — postgres-js's `unnest()` binding is structurally different from PGLite's and gets its own coverage.
- `test/e2e/search-quality.test.ts` runs search quality E2E against PGLite (no API keys, in-memory)
- `test/e2e/graph-quality.test.ts` runs the v0.10.3 knowledge graph pipeline (auto-link via put_page, reconciliation, traversePaths) against PGLite in-memory
- `test/e2e/postgres-jsonb.test.ts` — v0.12.2 regression test. Round-trips all 5 JSONB write sites (pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter) against real Postgres and asserts `jsonb_typeof='object'` plus `->>'key'` returns the expected scalar. The test that should have caught the original double-encode bug.
- `test/e2e/upgrade.test.ts` runs check-update E2E against real GitHub API (network required)
- Tier 2 (`skills.test.ts`) requires OpenClaw + API keys, runs nightly in CI
- If `.env.testing` doesn't exist in this directory, check sibling worktrees for one:
Expand Down
9 changes: 8 additions & 1 deletion INSTALL_FOR_AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -127,7 +127,7 @@ Verify: `gbrain integrations doctor` (after at least one is configured)

## Step 9: Verify

Read `docs/GBRAIN_VERIFY.md` and run all 6 verification checks. Check #4 (live sync
Read `docs/GBRAIN_VERIFY.md` and run all 7 verification checks. Check #4 (live sync
actually works) is the most important.

## Upgrade
Expand All @@ -145,3 +145,10 @@ this is how features ship in the binary but stay dormant in the user's brain.
For v0.12.0+ specifically: if your brain was created before v0.12.0, run
`gbrain extract links --source db && gbrain extract timeline --source db` to
backfill the new graph layer (see Step 4.5 above).

For v0.12.2+ specifically: if your brain is Postgres- or Supabase-backed and
predates v0.12.2, the `v0_12_2` migration runs `gbrain repair-jsonb`
automatically during `gbrain post-upgrade` to fix the double-encoded JSONB
columns. PGLite brains no-op. If wiki-style imports were truncated by the old
`splitBody` bug, run `gbrain sync --full` after upgrading to rebuild
`compiled_truth` from source markdown.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,6 +536,7 @@ ADMIN
gbrain integrations Integration recipe dashboard
gbrain check-backlinks check|fix Back-link enforcement
gbrain lint [--fix] LLM artifact detection
gbrain repair-jsonb [--dry-run] Repair v0.12.0 double-encoded JSONB (Postgres)
gbrain transcribe <audio> Transcribe audio (Groq Whisper)
gbrain research init <name> Scaffold a data-research recipe
gbrain research list Show available recipes
Expand Down
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.12.1
0.12.2
42 changes: 41 additions & 1 deletion docs/GBRAIN_VERIFY.md
Original file line number Diff line number Diff line change
Expand Up @@ -224,6 +224,43 @@ heuristics won't find them — file an issue with a sample page.

---

## 8. JSONB Frontmatter Integrity (v0.12.2)

Postgres-backed brains created before v0.12.2 had double-encoded JSONB columns
(`frontmatter->>'key'` returned NULL, GIN indexes were inert). `gbrain upgrade`
runs `gbrain repair-jsonb` automatically via the `v0_12_2` orchestrator.
Verify the repair succeeded.

**Command:**

```bash
gbrain repair-jsonb --dry-run --json
```

**Expected:** `totalRepaired: 0` across all 5 columns (`pages.frontmatter`,
`raw_data.data`, `ingest_log.pages_updated`, `files.metadata`,
`page_versions.frontmatter`). A zero count means every row is properly-typed
JSON objects, not string-encoded JSON.

**If the count is > 0:** The repair didn't run or was interrupted. Re-run
without `--dry-run`:

```bash
gbrain repair-jsonb
```

Idempotent. PGLite brains always report 0 (unaffected by the original bug).

**Bonus check** — frontmatter-keyed queries actually resolve:

```bash
gbrain call list_pages '{"frontmatterKey": "type", "frontmatterValue": "person"}'
```

If this returns rows on a brain with person pages, the JSONB path is healthy.

---

## Quick Verification (all checks in one pass)

```bash
Expand All @@ -247,7 +284,10 @@ gbrain check-update --json

# 7. Knowledge graph populated (links + timeline > 0)
gbrain stats | grep -E 'links|timeline'

# 8. JSONB integrity (v0.12.2 — Postgres only, PGLite always 0)
gbrain repair-jsonb --dry-run --json
```

If all seven return successfully, the installation is healthy. For the full
If all eight return successfully, the installation is healthy. For the full
end-to-end sync test (4c), push a real change and verify it appears in search.
68 changes: 68 additions & 0 deletions docs/UPGRADING_DOWNSTREAM_AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -177,6 +177,74 @@ Timeline entries still need explicit `gbrain timeline-add` calls.
```
Should return an indented tree of typed edges.

---

## v0.12.2 hotfix (data-correctness, no skill edits)

v0.12.2 is a Postgres data-correctness hotfix. No forked skill files need to
change — the skill contracts are unchanged. But you DO need to run the migration,
and you should know about one behavior change in markdown parsing.

### 1. Run the migration (Postgres-backed brains)

```bash
gbrain upgrade
```

The `v0_12_2` orchestrator runs `gbrain repair-jsonb` automatically. It rewrites
rows where `jsonb_typeof = 'string'` across `pages.frontmatter`, `raw_data.data`,
`ingest_log.pages_updated`, `files.metadata`, and `page_versions.frontmatter`.
Idempotent, safe to re-run. PGLite brains no-op cleanly.

Verify after upgrade:

```bash
gbrain repair-jsonb --dry-run --json # expect totalRepaired: 0
```

### 2. Recover any truncated wiki articles

If your brain imported wiki-style markdown before v0.12.2, some pages were
silently truncated (any standalone `---` in body content was treated as a
timeline separator). Re-import from source:

```bash
gbrain sync --full
```

The new `splitBody` rebuilds `compiled_truth` correctly.

### 3. Know the splitBody contract going forward

`splitBody` now requires an explicit timeline sentinel. Recognized markers
(priority order):

1. `<!-- timeline -->` (preferred — what `serializeMarkdown` emits)
2. `--- timeline ---` (decorated separator)
3. `---` directly before `## Timeline` or `## History` heading (backward-compat)

A bare `---` in body text is now a markdown horizontal rule, not a timeline
separator. If your agent writes pages with a bare `---` delimiter, migrate to
`<!-- timeline -->` — the `serializeMarkdown` helper already does this.

### 4. Wiki subtypes now auto-typed

`inferType` now auto-detects five additional directory patterns as their own
page types (previously they all defaulted to `concept`):

| Path pattern | New type |
|------------------------|----------------|
| `/wiki/analysis/` | `analysis` |
| `/wiki/guides/` | `guide` |
| `/wiki/hardware/` | `hardware` |
| `/wiki/architecture/` | `architecture` |
| `/writing/` | `writing` |

If your skills or queries filter by `type=concept` and expect wiki content in
that bucket, update them to include the new types.

---

## Future versions

When gbrain ships a new version, this doc will be updated with the diffs for that
Expand Down
5 changes: 3 additions & 2 deletions package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "gbrain",
"version": "0.12.0",
"version": "0.12.2",
"description": "Postgres-native personal knowledge brain with hybrid RAG search",
"type": "module",
"main": "src/core/index.ts",
Expand All @@ -20,8 +20,9 @@
"build": "bun build --compile --outfile bin/gbrain src/cli.ts",
"build:all": "bun build --compile --target=bun-darwin-arm64 --outfile bin/gbrain-darwin-arm64 src/cli.ts && bun build --compile --target=bun-linux-x64 --outfile bin/gbrain-linux-x64 src/cli.ts",
"build:schema": "bash scripts/build-schema.sh",
"test": "bun test",
"test": "scripts/check-jsonb-pattern.sh && bun test",
"test:e2e": "bun test test/e2e/",
"check:jsonb": "scripts/check-jsonb-pattern.sh",
"postinstall": "gbrain --version >/dev/null 2>&1 && gbrain apply-migrations --yes --non-interactive 2>/dev/null || true",
"prepublish:clawhub": "bun run build:all",
"publish:clawhub": "clawhub package publish . --family bundle-plugin"
Expand Down
Loading
Loading