Skip to content

integrate: upstream security + reliability fixes (v0.10.1 → v0.12.3)#31

Open
joedanz wants to merge 5 commits intomasterfrom
integrate/upstream-v0.12.x
Open

integrate: upstream security + reliability fixes (v0.10.1 → v0.12.3)#31
joedanz wants to merge 5 commits intomasterfrom
integrate/upstream-v0.12.x

Conversation

@joedanz
Copy link
Copy Markdown
Owner

@joedanz joedanz commented Apr 19, 2026

Summary

Integrates Wave-1 of the upstream garrytan/gbrain catch-up — 6 upstream PRs merged since our fork point at b7e3005 (v0.10.1), selectively pulled to take security, data-correctness, perf, and reliability fixes while deferring the larger new feature layers (Minions agent orchestration, knowledge graph) to a separate Wave-2 PR.

What's in Wave 1

5 commits, each isolated to one upstream fix set with full attribution via cherry-picked from trailers and Co-Authored-By credits to the original contributors:

What's deliberately NOT in this PR (Wave-2 material)

  • Upstream Minions v7 + v0.11.1 canonical migration + skillify garrytan/gbrain#130 Minions agent-orchestration stack: src/commands/jobs.ts, src/commands/skillpack-check.ts, skills/minion-orchestrator/, skills/skillify/, skills/skillpack-check/, src/core/minions/ (not present upstream but referenced in docs), the full src/commands/autopilot.ts rewrite, the v0.11.0 migration orchestrator, the pbrain jobs CLI. Conflicts with our install-skills command and skill layout; needs separate design review.
  • Upstream feat: knowledge graph layer — auto-link, typed relationships, graph-query (v0.10.3) garrytan/gbrain#188 Knowledge-graph layer: src/core/link-extraction.ts, src/commands/graph-query.ts, auto-link post-hook in put_page, v0.12.0 auto-wire migration orchestrator, 300+ eval fixtures under eval/data/world-v1/, typed-edge traversal. Complements our markdown-first direction but is a large standalone evaluation.
  • Upstream's schema migrations v5-v7 (Minions job queue tables) get applied by the inline src/core/migrate.ts runner as empty tables — harmless, forward-compatible with a future Minions integration but not used by any code in this PR.

Rebrand adaptation

Every integrated hunk was checked for gbrain / GBRAIN_ / ~/.gbrain references and rebranded to pbrain / PBRAIN_ / ~/.pbrain. Attribution references to upstream GBrain (URLs in CHANGELOG, NOTICE, docs/ATTRIBUTION.md) are preserved intentionally.

Test plan

  • bun test1201 unit tests pass, 0 fail, 8 skip. All 34 fails are E2E tests needing DATABASE_URL (expected environment limitation on this run).
  • bun test test/e2e/search-quality.test.ts12 pass, 0 fail (PGLite in-memory, no Docker needed).
  • bun run src/cli.ts apply-migrations --list — registry lists v0.12.2 (JSONB repair).
  • scripts/check-jsonb-pattern.sh — passes (no ${JSON.stringify(x)}::jsonb interpolation pattern in src/).
  • grep -r "gbrain\|GBRAIN" src/ — only the 2 pre-existing intentional refs (schema.sql comment, init.ts legacy migration).
  • E2E against real Postgres+pgvector — needs Docker daemon running on the merger's machine. Reviewer should run:
    docker run -d --name pbrain-test-pg -e POSTGRES_USER=postgres \
      -e POSTGRES_PASSWORD=postgres -e POSTGRES_DB=pbrain_test \
      -p 5435:5432 pgvector/pgvector:pg16
    DATABASE_URL=postgresql://postgres:postgres@localhost:5435/pbrain_test bun run test:e2e
    docker stop pbrain-test-pg && docker rm pbrain-test-pg
    
  • Smoke test on a scratch brain before landing: pbrain init --pglite, pbrain apply-migrations --list, pbrain repair-jsonb --dry-run, pbrain orphans, pbrain doctor — confirm all pass cleanly and the v0.12.3 checks (jsonb_integrity, markdown_body_completeness) show green on a fresh brain.

Follow-ups

A separate Wave-2 PR should evaluate the deferred items. CHANGELOG.md [Unreleased] lists what this PR integrated; the bullets point at the historical [0.10.2], [0.12.2], [0.12.1], [0.12.3] sections below for full upstream write-ups.

joedanz and others added 5 commits April 19, 2026 10:25
Pulls forward upstream GBrain's security-wave-3 fix set with the
gbrain→pbrain rebrand applied. Original PR by @garagon (garrytan#105-garrytan#109)
and bug report by @Hybirdss (garrytan#139); see attribution in CHANGELOG.

Fixes:
- file_upload arbitrary file read (realpathSync + slug/filename
  allowlist) closed for remote MCP callers.
- Recipe trust boundary is real now: only package-bundled recipes
  are embedded=true. $PBRAIN_RECIPES_DIR and cwd ./recipes are
  untrusted and cannot run command/http/string health_checks.
- String health_checks blocked for untrusted recipes.
- SSRF defense for HTTP health_checks: isInternalUrl() blocks
  loopback, RFC1918, link-local (AWS metadata), CGNAT, IPv6
  loopback; handles hex/octal/decimal encoding bypasses; scheme
  allowlist; manual redirect following with revalidation.
- Prompt-injection hardening for query expansion: XML-boundary
  prompt + regex sanitization + output validation.
- clampSearchLimit(limit, default, cap) takes an explicit cap so
  list_pages caps at 100 and get_ingest_log caps at 50.

Adds:
- OperationContext.remote flag (set false by CLI, true by MCP).
- 49 new security tests + E2E regression.

(cherry picked from commit 7bbfc3e9b3c3d63ca09d3c8cd0e8e3c5c27bf8ae)

Co-Authored-By: Garry Tan <garry@garrytan.com>
Co-Authored-By: garagon <noreply@github.com>
…rytan#130)

Adds the canonical migration runner framework introduced in upstream's
v0.11.1 PR (Minions + canonical migration + skillify). Takes only the
skeleton — the runner CLI, the registry shell, the shared types, and
the preferences/cli-util support files needed by future orchestrators.
The actual v0.11.0 orchestrator is Minions-adoption-specific (agent
queue setup, AGENTS.md routing, jobs smoke test) and is deliberately
NOT taken — PBrain has not adopted Minions.

Registry starts empty; subsequent commits in this wave add the JSONB
repair orchestrator (v0.12.2) which is the only migration we need
right now. The `compareVersions`, `getMigration`, and migration-list
plumbing are already in place for future additions.

Also wires the matching `pbrain init --migrate-only` branch used by
orchestrators to apply schema migrations against an already-configured
engine without clobbering the user's engine choice.

New commands:
  pbrain apply-migrations [--list] [--dry-run] [--migration vX.Y.Z]

New files:
  src/commands/apply-migrations.ts
  src/commands/migrations/index.ts       (empty registry)
  src/commands/migrations/types.ts
  src/core/cli-util.ts
  src/core/preferences.ts

(cherry picked subset of commit d861336cfa78e4f2ae9b67ecb79da70d0f6bb630)

Co-Authored-By: Garry Tan <garry@garrytan.com>
…an#196)

Data-correctness hotfix for Postgres-backed brains. Pulls forward the
v0.12.2 fix wave from upstream GBrain. PGLite brains were unaffected.

Three related Postgres-string-typed-data bugs that PGLite hid:

1. JSONB double-encode (postgres-engine.ts + files.ts): the
   ${JSON.stringify(value)}::jsonb interpolation pattern made
   postgres.js v3 stringify again on the wire, storing JSONB columns as
   quoted string literals. Every frontmatter->>'key' returned NULL on
   Postgres-backed brains; GIN indexes were inert. Fix: switch to
   sql.json(value), the postgres.js-native JSONB encoder. Affected:
   pages.frontmatter, raw_data.data, ingest_log.pages_updated,
   files.metadata, page_versions.frontmatter.
2. splitBody greedy --- match (markdown.ts): any standalone --- in body
   content was treated as a timeline separator, truncating wiki imports
   by up to 83% (reported by @knee5). Fix: require an explicit sentinel
   (<!-- timeline -->, --- timeline ---, or --- immediately before
   ## Timeline / ## History). Plain --- is a markdown horizontal rule.
3. parseEmbedding NaN scores (utils.ts): Supabase returned embedding
   columns as JSON strings; getEmbeddingsByChunkIds yielded NaN query
   scores. Fix: normalize via parseEmbedding.

Also adds /wiki/ subdirectory type inference (analysis, guides,
hardware, architecture, writing).

New:
- pbrain repair-jsonb [--dry-run] [--json] — standalone repair CLI
- src/commands/migrations/v0_12_2.ts — orchestrator (schema → repair
  → verify → record). Registered in migrations/index.ts.
- scripts/check-jsonb-pattern.sh — CI grep guard against regressions.
  Wired into `bun test` via the check:jsonb npm script.
- test/e2e/postgres-jsonb.test.ts — E2E round-trip regression.

Notes on deferred upstream work: upstream's v0.12.0 orchestrator
(knowledge-graph auto-wire) and v0_12_0.ts are NOT registered in this
fork's migration registry — PBrain has not adopted the knowledge graph
layer. The apply-migrations unit tests have been rewritten to assert
the planner's invariants against v0.12.2 instead of upstream's v0.11.0.

Original fixes contributed by:
- @knee5 (PR garrytan#187 — splitBody, inferType wiki, JSONB triple-fix)
- @leonardsellem (PR garrytan#175 — parseEmbedding, getEmbeddingsByChunkIds)

(cherry picked from commit c0b621923b641eae0e7d6228e50d9cdaa6bd97ae)

Co-Authored-By: Garry Tan <garry@garrytan.com>
Co-Authored-By: knee5 <noreply@github.com>
Co-Authored-By: leonardsellem <noreply@github.com>
…rytan#198)

Adds addLinksBatch / addTimelineEntriesBatch to the BrainEngine
interface. Both engines (PGLite + Postgres) implement the batch as a
single `INSERT ... SELECT FROM unnest(...) JOIN pages ON CONFLICT
DO NOTHING RETURNING 1` statement — 4-5 array-typed bound parameters
regardless of batch size, sidestepping the 65535-parameter cap and
the postgres-js sql(rows, ...) helper's identifier-escape gotcha.

pbrain extract (file-source) now flushes candidates 100 at a time via
the batch API instead of one write per link/entry. On large brains
this drops wall-clock from "tens of minutes" to "seconds" end-to-end.

Schema changes required for ON CONFLICT (from_page_id, to_page_id,
link_type) to match a real unique index:

- links table UNIQUE constraint widened from 2 columns to 3
  (from_page_id, to_page_id, link_type), matching upstream v0.10.3
  knowledge-graph layer. addLink's ON CONFLICT clause updated in
  both engines. Fresh installs get the new constraint from schema.sql
  / pglite-schema.ts; existing installs pick it up via migration v8.
- src/core/migrate.ts picks up upstream's v5-v10 schema migrations.
  v5-v7 (minion_jobs / inbox / attachments) create empty tables that
  this fork doesn't use; they're harmless and keep the schema
  forward-compatible with a future Minions integration. v8-v10 touch
  the links and timeline_entries tables we already have.

Notes on deferred upstream work:
- The DB-source `pbrain extract --source db` path remains deferred
  with the knowledge-graph layer (depends on src/core/link-extraction.ts
  which we haven't taken). File-source extract is what uses the new
  batch API.
- Upstream's v0.12.0 Knowledge Graph auto-wire orchestrator is NOT
  registered (src/commands/migrations/v0_12_0.ts deleted).

(cherry picked from commit 699db50d6cb3b96c3ba3cea8cd55c78f0f9c3bae)

Co-Authored-By: Garry Tan <garry@garrytan.com>
Pulls forward the v0.12.3 reliability fixes — sync deadlock,
search-timeout scoping, tryParseEmbedding for search corruption
tolerance, pbrain orphans command + find_orphans MCP op, and two new
doctor checks (jsonb_integrity, markdown_body_completeness) that
point at the repair-jsonb / sync --force remediation.

What changed:

- src/commands/sync.ts: dropped the outer engine.transaction() wrap
  so importFromContent's per-file transaction isn't nested. PGLite's
  _runExclusiveTransaction is non-reentrant, so the inner call used
  to park on the mutex forever once ~10 files hit the pipeline.
- src/core/postgres-engine.ts: searchKeyword / searchVector now scope
  statement_timeout via sql.begin + SET LOCAL so the GUC can't leak
  onto a pooled connection and clip unrelated long-running queries.
- src/core/utils.ts: new tryParseEmbedding() (returns null + warns
  once per process on bad input) for search/rescore paths where
  availability matters more than strictness. parseEmbedding() stays
  strict for migration/ingest paths.
- src/commands/orphans.ts: new. Domain-grouped report of pages with
  zero inbound wikilinks; --include-pseudo flag, --json, --count.
  Also wired as find_orphans MCP operation.
- src/commands/doctor.ts: +2 reliability checks. jsonb_integrity
  scans pages.frontmatter, raw_data.data, ingest_log.pages_updated,
  files.metadata for jsonb_typeof='string' rows (v0.12.0 residue);
  markdown_body_completeness flags pages with compiled_truth <30%
  of raw source when raw has multiple H2/H3 boundaries.

New tests: test/orphans.test.ts, test/postgres-engine.test.ts,
test/e2e/jsonb-roundtrip.test.ts. doctor.ts and sync.ts existing tests
updated with new check/deadlock assertions.

Upstream's src/core/link-extraction.ts (knowledge-graph layer) is NOT
taken — it's Wave-2 material. All integrated code paths operate on
the already-existing links/timeline_entries schema.

(cherry picked from commit 013b348)

Co-Authored-By: Garry Tan <garry@garrytan.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant