Summary
extractAndEnrich in src/core/enrichment-service.ts runs a regex over arbitrary text, pulls out anything that looks like a capitalized name, and unconditionally calls engine.putPage() on the owner's brain under people/{slug} or companies/{slug}. There is no confirmation step, no allow-list, no review queue, and the source of the text is whatever skill invoked the call — ingest, meeting-ingestion, idea-ingest, any of the recipes that forward raw email / transcript / pasted content.
The practical shape:
- A meeting transcript, pasted article, or email comes in via one of the ingest skills.
- The skill calls
extractAndEnrich(engine, text, sourceSlug) as part of enrichment.
extractEntities greps for \b[A-Z][a-z]+(?:\s+[A-Z][a-z]+){1,3}\b, accepts any hit, and feeds each to enrichEntity.
- If the entity page doesn't exist,
enrichEntity creates it with generateStubContent(name, type, context) — where context is a 50-char window of the raw input text — and writes it to the brain as an authoritative page.
So the attacker-shaped case looks like: a hostile email (or a spoofed meeting transcript, or a poisoned CSV attached to a briefing) carrying the sentences
I had lunch with Paul Graham today. He mentioned that Ycombinator Inc. is pivoting to crypto.
ends up creating people/paul-graham and companies/ycombinator-inc pages in the owner's brain, with the sentence copied in as the summary. The owner never typed that. Agents that query the brain later ("what does gbrain say about Paul Graham?") quote it back as ground truth.
This is a data-injection / authz gap, not an RCE. It's latent today because the current in-tree caller set is small, but the skill-driven ingest paths this wires into (meeting-ingestion, idea-ingest, media-ingest) all take external content by design.
Why it matters
The brain's value is that it's trusted. Everything downstream — search, enrichment chains, agent responses — assumes pages in people/ and companies/ reflect the owner's observations. Once untrusted ingest gets unbounded write access to those namespaces, the trust model collapses silently: the owner can't tell which entries came from their own notes and which came from a sentence that happened to sit in an incoming email.
Secondary effects:
- Slug collisions.
slugifyEntity normalizes Tim<Cook> and Tim Cook to the same slug. An attacker who wants to tamper with an existing legitimate page just picks a name that slugifies to its path and ships a sentence that mentions it — enrichEntity sees the page exists, takes the UPDATE branch, and appends a timeline entry quoting the attacker's context (see enrichEntity around lines 85-100 — the UPDATE path is silent on who supplied the timeline text). R6 filed this as a separate finding (slugify collision); it's more impactful once F004 is closed but worth mentioning here.
- Regex is greedy.
[A-Z][a-z]+ matches Dear John, Best Regards, New York, Bar Mitzvah. The false-positive rate on real email means a brain wired to this function fills with junk entries even without an adversary.
Proposed approaches
Three shapes, ordered by blast radius. All of them keep the function exported with the same name so existing in-tree callers keep working.
1. Quarantine namespace by default
extractAndEnrich writes proposed pages under _pending_review/{slug} instead of people/{slug} / companies/{slug}.
- A separate command (
gbrain review --pending) lets the owner approve, reject, or merge each proposal into the real namespace.
- Ingest skills are unaffected — they still call
extractAndEnrich — but the user is the gate on what becomes authoritative.
Smallest behavior change from the owner's point of view, strongest defense. The review queue is the audit trail.
2. Require an explicit autocreate: true flag per request
- Default
enrichEntity / extractAndEnrich to action: 'skipped' when the page doesn't exist.
- Callers that genuinely want page creation pass
{ autocreate: true } and accept the risk.
- Skills that take external content (meeting-ingestion, idea-ingest, media-ingest) get the default; skills that take owner-typed input (
gbrain new person … if it exists) set the flag explicitly.
Smaller diff than (1). Downside: the default is a behavior change from today, and every skill that currently relies on auto-create needs a touch. That's the skills surface you asked to keep sensitive — happy to leave the audit to a maintainer-led review rather than a drive-by PR.
3. Source-based policy (config-driven)
- Add
enrichment.policy in gbrain config: allow | quarantine | deny, per sourceSlug prefix.
meeting-ingestion → quarantine, manual-entry → allow, etc.
- Ships as config, no code change beyond one read at the top of
enrichEntity.
Most flexible, highest cognitive cost (another policy knob the owner has to remember).
PoC
Runtime PoC from the audit (internal, not public):
// Minimal shape — real PoC file at workspace/gbrain/report/evidence/poc-r6-f004-*.ts
const { extractAndEnrich } = await import('./src/core/enrichment-service.ts');
const hostileText = 'I met with Paul Graham. He mentioned Ycombinator Inc. is pivoting.';
const results = await extractAndEnrich(engine, hostileText, 'ingest/email');
// After: engine.getPage('people/paul-graham') returns a page that was never
// authored by the owner, with the hostile sentence as its Summary field.
Nothing in the call chain rejects it. results[0].action === 'created' and the page is now discoverable via searchKeyword('Paul Graham').
What I'd file against if chosen
Happy to send a PR for any of the three approaches. I'd lean (1) because it preserves current behavior for the owner while slamming the door on silent writes — owner still sees the proposal, can accept in one command, and gets an audit log for free. Open to (2) or (3) if you prefer less churn in the review UX.
No PR yet because (a) this is shaped like a product decision more than a bug fix, and (b) the right fix depends on how you want enrichment to feel end-to-end — which touches the skill surface you own directly. Happy to defer, happy to spec out the chosen approach once you pick.
Related
Out of scope
- Timeline entry injection on existing pages (related, deserves its own issue).
- Entity-regex false positives (UX problem, not security).
- Tier auto-escalation based on untrusted source count (tangential; an attacker who can create a page can also inflate its tier, but that's an amplifier, not the root cause).
Summary
extractAndEnrichinsrc/core/enrichment-service.tsruns a regex over arbitrary text, pulls out anything that looks like a capitalized name, and unconditionally callsengine.putPage()on the owner's brain underpeople/{slug}orcompanies/{slug}. There is no confirmation step, no allow-list, no review queue, and the source of the text is whatever skill invoked the call — ingest, meeting-ingestion, idea-ingest, any of the recipes that forward raw email / transcript / pasted content.The practical shape:
extractAndEnrich(engine, text, sourceSlug)as part of enrichment.extractEntitiesgreps for\b[A-Z][a-z]+(?:\s+[A-Z][a-z]+){1,3}\b, accepts any hit, and feeds each toenrichEntity.enrichEntitycreates it withgenerateStubContent(name, type, context)— wherecontextis a 50-char window of the raw input text — and writes it to the brain as an authoritative page.So the attacker-shaped case looks like: a hostile email (or a spoofed meeting transcript, or a poisoned CSV attached to a briefing) carrying the sentences
ends up creating
people/paul-grahamandcompanies/ycombinator-incpages in the owner's brain, with the sentence copied in as the summary. The owner never typed that. Agents that query the brain later ("what does gbrain say about Paul Graham?") quote it back as ground truth.This is a data-injection / authz gap, not an RCE. It's latent today because the current in-tree caller set is small, but the skill-driven ingest paths this wires into (meeting-ingestion, idea-ingest, media-ingest) all take external content by design.
Why it matters
The brain's value is that it's trusted. Everything downstream — search, enrichment chains, agent responses — assumes pages in
people/andcompanies/reflect the owner's observations. Once untrusted ingest gets unbounded write access to those namespaces, the trust model collapses silently: the owner can't tell which entries came from their own notes and which came from a sentence that happened to sit in an incoming email.Secondary effects:
slugifyEntitynormalizesTim<Cook>andTim Cookto the same slug. An attacker who wants to tamper with an existing legitimate page just picks a name that slugifies to its path and ships a sentence that mentions it —enrichEntitysees the page exists, takes the UPDATE branch, and appends a timeline entry quoting the attacker's context (seeenrichEntityaround lines 85-100 — the UPDATE path is silent on who supplied the timeline text). R6 filed this as a separate finding (slugify collision); it's more impactful once F004 is closed but worth mentioning here.[A-Z][a-z]+matchesDear John,Best Regards,New York,Bar Mitzvah. The false-positive rate on real email means a brain wired to this function fills with junk entries even without an adversary.Proposed approaches
Three shapes, ordered by blast radius. All of them keep the function exported with the same name so existing in-tree callers keep working.
1. Quarantine namespace by default
extractAndEnrichwrites proposed pages under_pending_review/{slug}instead ofpeople/{slug}/companies/{slug}.gbrain review --pending) lets the owner approve, reject, or merge each proposal into the real namespace.extractAndEnrich— but the user is the gate on what becomes authoritative.Smallest behavior change from the owner's point of view, strongest defense. The review queue is the audit trail.
2. Require an explicit
autocreate: trueflag per requestenrichEntity/extractAndEnrichtoaction: 'skipped'when the page doesn't exist.{ autocreate: true }and accept the risk.gbrain new person …if it exists) set the flag explicitly.Smaller diff than (1). Downside: the default is a behavior change from today, and every skill that currently relies on auto-create needs a touch. That's the skills surface you asked to keep sensitive — happy to leave the audit to a maintainer-led review rather than a drive-by PR.
3. Source-based policy (config-driven)
enrichment.policyin gbrain config:allow | quarantine | deny, persourceSlugprefix.meeting-ingestion→quarantine,manual-entry→allow, etc.enrichEntity.Most flexible, highest cognitive cost (another policy knob the owner has to remember).
PoC
Runtime PoC from the audit (internal, not public):
Nothing in the call chain rejects it.
results[0].action === 'created'and the page is now discoverable viasearchKeyword('Paul Graham').What I'd file against if chosen
Happy to send a PR for any of the three approaches. I'd lean (1) because it preserves current behavior for the owner while slamming the door on silent writes — owner still sees the proposal, can accept in one command, and gets an audit log for free. Open to (2) or (3) if you prefer less churn in the review UX.
No PR yet because (a) this is shaped like a product decision more than a bug fix, and (b) the right fix depends on how you want enrichment to feel end-to-end — which touches the skill surface you own directly. Happy to defer, happy to spec out the chosen approach once you pick.
Related
check-resolvable.ts) — same audit round, different finding.slugifyEntityis a separate follow-up that becomes easier to exploit if F004 stays open; can be addressed alongside or independently.Out of scope