integrate: upstream security + reliability fixes (v0.10.1 → v0.12.3)#31
Open
integrate: upstream security + reliability fixes (v0.10.1 → v0.12.3)#31
Conversation
Pulls forward upstream GBrain's security-wave-3 fix set with the gbrain→pbrain rebrand applied. Original PR by @garagon (garrytan#105-garrytan#109) and bug report by @Hybirdss (garrytan#139); see attribution in CHANGELOG. Fixes: - file_upload arbitrary file read (realpathSync + slug/filename allowlist) closed for remote MCP callers. - Recipe trust boundary is real now: only package-bundled recipes are embedded=true. $PBRAIN_RECIPES_DIR and cwd ./recipes are untrusted and cannot run command/http/string health_checks. - String health_checks blocked for untrusted recipes. - SSRF defense for HTTP health_checks: isInternalUrl() blocks loopback, RFC1918, link-local (AWS metadata), CGNAT, IPv6 loopback; handles hex/octal/decimal encoding bypasses; scheme allowlist; manual redirect following with revalidation. - Prompt-injection hardening for query expansion: XML-boundary prompt + regex sanitization + output validation. - clampSearchLimit(limit, default, cap) takes an explicit cap so list_pages caps at 100 and get_ingest_log caps at 50. Adds: - OperationContext.remote flag (set false by CLI, true by MCP). - 49 new security tests + E2E regression. (cherry picked from commit 7bbfc3e9b3c3d63ca09d3c8cd0e8e3c5c27bf8ae) Co-Authored-By: Garry Tan <garry@garrytan.com> Co-Authored-By: garagon <noreply@github.com>
…rytan#130) Adds the canonical migration runner framework introduced in upstream's v0.11.1 PR (Minions + canonical migration + skillify). Takes only the skeleton — the runner CLI, the registry shell, the shared types, and the preferences/cli-util support files needed by future orchestrators. The actual v0.11.0 orchestrator is Minions-adoption-specific (agent queue setup, AGENTS.md routing, jobs smoke test) and is deliberately NOT taken — PBrain has not adopted Minions. Registry starts empty; subsequent commits in this wave add the JSONB repair orchestrator (v0.12.2) which is the only migration we need right now. The `compareVersions`, `getMigration`, and migration-list plumbing are already in place for future additions. Also wires the matching `pbrain init --migrate-only` branch used by orchestrators to apply schema migrations against an already-configured engine without clobbering the user's engine choice. New commands: pbrain apply-migrations [--list] [--dry-run] [--migration vX.Y.Z] New files: src/commands/apply-migrations.ts src/commands/migrations/index.ts (empty registry) src/commands/migrations/types.ts src/core/cli-util.ts src/core/preferences.ts (cherry picked subset of commit d861336cfa78e4f2ae9b67ecb79da70d0f6bb630) Co-Authored-By: Garry Tan <garry@garrytan.com>
…an#196) Data-correctness hotfix for Postgres-backed brains. Pulls forward the v0.12.2 fix wave from upstream GBrain. PGLite brains were unaffected. Three related Postgres-string-typed-data bugs that PGLite hid: 1. JSONB double-encode (postgres-engine.ts + files.ts): the ${JSON.stringify(value)}::jsonb interpolation pattern made postgres.js v3 stringify again on the wire, storing JSONB columns as quoted string literals. Every frontmatter->>'key' returned NULL on Postgres-backed brains; GIN indexes were inert. Fix: switch to sql.json(value), the postgres.js-native JSONB encoder. Affected: pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata, page_versions.frontmatter. 2. splitBody greedy --- match (markdown.ts): any standalone --- in body content was treated as a timeline separator, truncating wiki imports by up to 83% (reported by @knee5). Fix: require an explicit sentinel (<!-- timeline -->, --- timeline ---, or --- immediately before ## Timeline / ## History). Plain --- is a markdown horizontal rule. 3. parseEmbedding NaN scores (utils.ts): Supabase returned embedding columns as JSON strings; getEmbeddingsByChunkIds yielded NaN query scores. Fix: normalize via parseEmbedding. Also adds /wiki/ subdirectory type inference (analysis, guides, hardware, architecture, writing). New: - pbrain repair-jsonb [--dry-run] [--json] — standalone repair CLI - src/commands/migrations/v0_12_2.ts — orchestrator (schema → repair → verify → record). Registered in migrations/index.ts. - scripts/check-jsonb-pattern.sh — CI grep guard against regressions. Wired into `bun test` via the check:jsonb npm script. - test/e2e/postgres-jsonb.test.ts — E2E round-trip regression. Notes on deferred upstream work: upstream's v0.12.0 orchestrator (knowledge-graph auto-wire) and v0_12_0.ts are NOT registered in this fork's migration registry — PBrain has not adopted the knowledge graph layer. The apply-migrations unit tests have been rewritten to assert the planner's invariants against v0.12.2 instead of upstream's v0.11.0. Original fixes contributed by: - @knee5 (PR garrytan#187 — splitBody, inferType wiki, JSONB triple-fix) - @leonardsellem (PR garrytan#175 — parseEmbedding, getEmbeddingsByChunkIds) (cherry picked from commit c0b621923b641eae0e7d6228e50d9cdaa6bd97ae) Co-Authored-By: Garry Tan <garry@garrytan.com> Co-Authored-By: knee5 <noreply@github.com> Co-Authored-By: leonardsellem <noreply@github.com>
…rytan#198) Adds addLinksBatch / addTimelineEntriesBatch to the BrainEngine interface. Both engines (PGLite + Postgres) implement the batch as a single `INSERT ... SELECT FROM unnest(...) JOIN pages ON CONFLICT DO NOTHING RETURNING 1` statement — 4-5 array-typed bound parameters regardless of batch size, sidestepping the 65535-parameter cap and the postgres-js sql(rows, ...) helper's identifier-escape gotcha. pbrain extract (file-source) now flushes candidates 100 at a time via the batch API instead of one write per link/entry. On large brains this drops wall-clock from "tens of minutes" to "seconds" end-to-end. Schema changes required for ON CONFLICT (from_page_id, to_page_id, link_type) to match a real unique index: - links table UNIQUE constraint widened from 2 columns to 3 (from_page_id, to_page_id, link_type), matching upstream v0.10.3 knowledge-graph layer. addLink's ON CONFLICT clause updated in both engines. Fresh installs get the new constraint from schema.sql / pglite-schema.ts; existing installs pick it up via migration v8. - src/core/migrate.ts picks up upstream's v5-v10 schema migrations. v5-v7 (minion_jobs / inbox / attachments) create empty tables that this fork doesn't use; they're harmless and keep the schema forward-compatible with a future Minions integration. v8-v10 touch the links and timeline_entries tables we already have. Notes on deferred upstream work: - The DB-source `pbrain extract --source db` path remains deferred with the knowledge-graph layer (depends on src/core/link-extraction.ts which we haven't taken). File-source extract is what uses the new batch API. - Upstream's v0.12.0 Knowledge Graph auto-wire orchestrator is NOT registered (src/commands/migrations/v0_12_0.ts deleted). (cherry picked from commit 699db50d6cb3b96c3ba3cea8cd55c78f0f9c3bae) Co-Authored-By: Garry Tan <garry@garrytan.com>
Pulls forward the v0.12.3 reliability fixes — sync deadlock, search-timeout scoping, tryParseEmbedding for search corruption tolerance, pbrain orphans command + find_orphans MCP op, and two new doctor checks (jsonb_integrity, markdown_body_completeness) that point at the repair-jsonb / sync --force remediation. What changed: - src/commands/sync.ts: dropped the outer engine.transaction() wrap so importFromContent's per-file transaction isn't nested. PGLite's _runExclusiveTransaction is non-reentrant, so the inner call used to park on the mutex forever once ~10 files hit the pipeline. - src/core/postgres-engine.ts: searchKeyword / searchVector now scope statement_timeout via sql.begin + SET LOCAL so the GUC can't leak onto a pooled connection and clip unrelated long-running queries. - src/core/utils.ts: new tryParseEmbedding() (returns null + warns once per process on bad input) for search/rescore paths where availability matters more than strictness. parseEmbedding() stays strict for migration/ingest paths. - src/commands/orphans.ts: new. Domain-grouped report of pages with zero inbound wikilinks; --include-pseudo flag, --json, --count. Also wired as find_orphans MCP operation. - src/commands/doctor.ts: +2 reliability checks. jsonb_integrity scans pages.frontmatter, raw_data.data, ingest_log.pages_updated, files.metadata for jsonb_typeof='string' rows (v0.12.0 residue); markdown_body_completeness flags pages with compiled_truth <30% of raw source when raw has multiple H2/H3 boundaries. New tests: test/orphans.test.ts, test/postgres-engine.test.ts, test/e2e/jsonb-roundtrip.test.ts. doctor.ts and sync.ts existing tests updated with new check/deadlock assertions. Upstream's src/core/link-extraction.ts (knowledge-graph layer) is NOT taken — it's Wave-2 material. All integrated code paths operate on the already-existing links/timeline_entries schema. (cherry picked from commit 013b348) Co-Authored-By: Garry Tan <garry@garrytan.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Integrates Wave-1 of the upstream garrytan/gbrain catch-up — 6 upstream PRs merged since our fork point at
b7e3005(v0.10.1), selectively pulled to take security, data-correctness, perf, and reliability fixes while deferring the larger new feature layers (Minions agent orchestration, knowledge graph) to a separate Wave-2 PR.What's in Wave 1
5 commits, each isolated to one upstream fix set with full attribution via
cherry-picked fromtrailers andCo-Authored-Bycredits to the original contributors:2967fd9— Security wave 3 (from upstream security: fix wave 3 — 9 vulns (file_upload, SSRF, recipe trust, prompt injection) garrytan/gbrain#174). 9 vulnerabilities closed:file_uploadarbitrary-file-read, fake recipe trust boundary, string health-check execSync, HTTP SSRF, prompt-injection hardening,list_pages/get_ingest_logmissing caps. Original fixes by @garagon (security: confine file_upload paths to working directory garrytan/gbrain#105-security: clamp list_pages and get_ingest_log limits garrytan/gbrain#109) and @Hybirdss ([High] Arbitrary local file read via file_upload path parameter garrytan/gbrain#139).b65c88c— Migrations runner infrastructure subset (from upstream Minions v7 + v0.11.1 canonical migration + skillify garrytan/gbrain#130). Addspbrain apply-migrations+src/commands/migrations/framework. Registry starts empty; upstream's v0.11.0 Minions-adoption orchestrator is deliberately not registered.a3c3547— JSONB double-encode + splitBody wiki + parseEmbedding (from upstream fix: JSONB double-encode + splitBody wiki + parseEmbedding (v0.12.1) garrytan/gbrain#196). Fixes${JSON.stringify(x)}::jsonbinterpolation bug (Postgres-only data corruption), wiki-article 83% truncation from greedy---match in splitBody, and NaN search scores from string-typed embeddings on Supabase. Addspbrain repair-jsonb, a CI grep guard, and an E2E regression. Original fixes by @knee5 (fix: markdown parsing bugs affecting wiki-style content garrytan/gbrain#187) and @leonardsellem (fix: parse pgvector embeddings before cosine rescoring garrytan/gbrain#175).1830908— Extract N+1 hang fix + batch insert API (from upstream fix(extract+migrate): kill N+1 hang + v0.12.0 migration timeout (v0.12.1) garrytan/gbrain#198). NewaddLinksBatch/addTimelineEntriesBatchmethods on both PGLite and Postgres engines usingINSERT ... SELECT FROM unnest(...) ON CONFLICT DO NOTHING— 4-5 array params regardless of batch size. Dropspbrain extractwall-clock from minutes to seconds on large brains. Widens thelinksUNIQUE constraint from 2 to 3 columns (from_page_id, to_page_id, link_type) to match upstream's multi-type-link schema.df261e0— Reliability wave (from upstream v0.12.3: Reliability wave — sync deadlock, search timeout scoping, wikilinks, orphans garrytan/gbrain#216). Sync deadlock fix (PGLite non-reentrant transaction mutex),statement_timeoutscoped to search viasql.begin+SET LOCAL,tryParseEmbeddingfor search corruption tolerance, newpbrain orphanscommand +find_orphansMCP op, two newdoctorchecks (jsonb_integrity+markdown_body_completeness). Contributed by @sunnnybala and @garagon (upstream community).What's deliberately NOT in this PR (Wave-2 material)
src/commands/jobs.ts,src/commands/skillpack-check.ts,skills/minion-orchestrator/,skills/skillify/,skills/skillpack-check/,src/core/minions/(not present upstream but referenced in docs), the fullsrc/commands/autopilot.tsrewrite, the v0.11.0 migration orchestrator, thepbrain jobsCLI. Conflicts with ourinstall-skillscommand and skill layout; needs separate design review.src/core/link-extraction.ts,src/commands/graph-query.ts, auto-link post-hook input_page, v0.12.0 auto-wire migration orchestrator, 300+ eval fixtures undereval/data/world-v1/, typed-edge traversal. Complements our markdown-first direction but is a large standalone evaluation.src/core/migrate.tsrunner as empty tables — harmless, forward-compatible with a future Minions integration but not used by any code in this PR.Rebrand adaptation
Every integrated hunk was checked for
gbrain/GBRAIN_/~/.gbrainreferences and rebranded topbrain/PBRAIN_/~/.pbrain. Attribution references to upstream GBrain (URLs in CHANGELOG, NOTICE, docs/ATTRIBUTION.md) are preserved intentionally.Test plan
bun test— 1201 unit tests pass, 0 fail, 8 skip. All 34 fails are E2E tests needingDATABASE_URL(expected environment limitation on this run).bun test test/e2e/search-quality.test.ts— 12 pass, 0 fail (PGLite in-memory, no Docker needed).bun run src/cli.ts apply-migrations --list— registry lists v0.12.2 (JSONB repair).scripts/check-jsonb-pattern.sh— passes (no${JSON.stringify(x)}::jsonbinterpolation pattern in src/).grep -r "gbrain\|GBRAIN" src/— only the 2 pre-existing intentional refs (schema.sql comment, init.ts legacy migration).pbrain init --pglite,pbrain apply-migrations --list,pbrain repair-jsonb --dry-run,pbrain orphans,pbrain doctor— confirm all pass cleanly and the v0.12.3 checks (jsonb_integrity, markdown_body_completeness) show green on a fresh brain.Follow-ups
A separate Wave-2 PR should evaluate the deferred items.
CHANGELOG.md[Unreleased]lists what this PR integrated; the bullets point at the historical[0.10.2],[0.12.2],[0.12.1],[0.12.3]sections below for full upstream write-ups.