Skip to content

[recipes] Add wiki-synthesis — autobiography + email-thread wikis#17

Open
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/wiki-synthesis
Open

[recipes] Add wiki-synthesis — autobiography + email-thread wikis#17
alanshurafa wants to merge 7 commits intomainfrom
contrib/alanshurafa/wiki-synthesis

Conversation

@alanshurafa
Copy link
Copy Markdown
Owner

Summary

New recipe under recipes/wiki-synthesis/ that ports ExoCortex's wiki-synthesis work into OB1:

  • scripts/synthesize-wiki.mjs — topic-scoped synthesizer with an autobiography mode that groups thoughts by year and asks an OpenAI-compatible Chat Completions endpoint for second-person biographical prose per year. Extensible catalogue (drop in more topics: career, travel, relationships, etc.).
  • scripts/backfill-gmail-wikis.mjs — resume-safe per-thread wiki generator for Gmail-imported thoughts. Groups by metadata.gmail.thread_id, filters by word-count + message/atom thresholds, writes gmail_wiki thoughts with derived_from edges back to source atoms. Prefers an upsert_thought RPC when present; plain insert fallback.
  • dashboard-snippets/ — optional Next.js Server-Action components for a /wiki index + /wiki/[slug] detail view. Users copy into their own dashboard and wire in auth.

How it differs from entity-wiki

  • entity-wiki (separate PR, branch contrib/alanshurafa/entity-wiki) synthesizes one page per entity and needs the entity-extraction schema.
  • wiki-synthesis (this PR) synthesizes one page per corpus slice (year, topic, email thread) and only requires the core thoughts table, plus optional thought_edges for email-thread provenance.

Both recipes are documented to cross-reference each other.

What it requires

  • Open Brain setup, Node.js 18+, any Chat-Completions-compatible LLM endpoint.
  • Autobiography mode: only the core thoughts table.
  • Email-thread mode: thoughts imported via recipes/email-history-import/ (or compatible importer that populates metadata.gmail.thread_id + metadata.gmail.gmail_id), plus a public.thought_edges table from the Knowledge Graph schema (upstream PR [schemas] Knowledge graph tables and extraction trigger #5).

README calls these prerequisites out explicitly so users know which OB1 layers they need before running the pipeline.

Generalization notes (from the ExoCortex origin)

  • Replaced MCP edge-function client (x-brain-key / open-brain-rest) with direct PostgREST + service role — matches the entity-wiki recipe's env-var contract (OPEN_BRAIN_URL, OPEN_BRAIN_SERVICE_KEY, LLM_BASE_URL, LLM_API_KEY, LLM_MODEL).
  • Swapped Anthropic-direct HTTP calls for the OpenAI-compatible Chat Completions pattern so users can point at OpenRouter/OpenAI/local LLMs.
  • Dropped the claude-cli provider branch from backfill-gmail-wikis.mjs (that was a local-compute workaround, not appropriate for a shared recipe).
  • New env knobs: SUBJECT_NAME (narrator voice), SOURCE_TYPE_FILTER (narrow autobiography corpus), WIKI_OUTPUT_DIR.
  • Stripped hardcoded personal data; the only remaining personal reference is the author metadata, as expected.

Test plan

  • Dry-run the autobiography synthesizer on my own Open Brain (--dry-run --scope year=2024) and confirm the year bucketing works against core thoughts.
  • Generate a single-year autobiography end-to-end and inspect output/wiki/autobiography-2024.md.
  • Dry-run backfill-gmail-wikis.mjs against my imported Gmail corpus and confirm eligibility counts match expectation.
  • Full backfill on a small subset (--limit=5), verify wiki thoughts land with source_type='gmail_wiki' and that thought_edges has derived_from rows back to source atoms.
  • Copy dashboard snippets into my dashboard fork, confirm the /wiki index renders and the Server Action round-trips to the script.

Pre-review fork PR — this is the intake PR for cross-AI review before any upstream PR to NateBJones-Projects/OB1.

…from thoughts

Adds a new recipe that ships two Node scripts for wiki-style synthesis over
the core `thoughts` table, plus optional Next.js dashboard snippets.

- `scripts/synthesize-wiki.mjs` — topic-scoped synthesizer with a built-in
  `autobiography` mode that groups thoughts by year and asks an
  OpenAI-compatible Chat Completions endpoint to produce second-person
  biographical prose per year. Extend the catalogue to add more topics.
- `scripts/backfill-gmail-wikis.mjs` — resume-safe per-thread wiki
  generator for Gmail-imported thoughts. Groups by `metadata.gmail.thread_id`,
  filters by word-count + message/atom thresholds, writes wiki thoughts
  with `derived_from` edges to their source atoms. Prefers an
  `upsert_thought` RPC when present, falls back to plain inserts.
- `dashboard-snippets/` — optional Next.js components (Server Action +
  `/wiki` index + `/wiki/[slug]` detail) to copy into a dashboard.

Generalized from ExoCortex origin: replaced the MCP edge-function client
with direct PostgREST + service role, swapped the Anthropic-direct call
for OpenAI-compatible Chat Completions, removed personal data references,
introduced SUBJECT_NAME and SOURCE_TYPE_FILTER env knobs. Complements the
in-flight entity-wiki recipe — this one does corpus-slice synthesis and
only requires the core `thoughts` table.

README documents the thought_edges / upsert_thought prerequisites so users
know which OB1 layers they need before running the email-thread pipeline.
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 181f48e289

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +115 to +117
const thoughtId = active.id as number;
const newStatus = over.id as string;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Resolve status from drop container, not hovered card

handleDragEnd treats over.id as the new workflow status, but with SortableContext each card is also a droppable target, so over.id is often a thought ID rather than a column key. In those common drops, this sends values like "123" to /api/kanban/update, which fails VALID_STATUSES validation and reverts the move, making drag-and-drop unreliable unless the user drops in empty column space.

Useful? React with 👍 / 👎.

Comment on lines +15 to +17
UPDATE thoughts
SET status = 'new', status_updated_at = now()
WHERE metadata->>'type' IN ('task', 'idea') AND status IS NULL;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Backfill workflow status from type column

The migration backfill filters on metadata->>'type', but workflow task classification is stored in the top-level type field (and even this schema's README uses WHERE type IN ('task', 'idea')). On existing databases this leaves pre-existing task/idea rows with status IS NULL, so they are excluded from status-filtered workflow queries until users run a manual corrective update.

Useful? React with 👍 / 👎.

…tion

Wrap raw thought content in <entries>...</entries> delimiters and tell the
system+user prompt explicitly to ignore instructions inside it. Captured
thoughts are user data and can contain adversarial text that overrides
the biographer task.
Wrap raw email-thread content in <thread>...</thread> delimiters and
tell the system prompt to treat it as untrusted data. Email bodies
arrive from external senders and may contain adversarial instructions
that override the summarization task.
…idate year

Server Action was passing the host's entire process.env to the
synthesize-wiki child process, leaking every unrelated secret. Pass only
the OB1/LLM-related vars plus the minimal system env the child needs.
Also server-side-validate scope_year against /^(19|20)\d{2}$/ (P3).
Auth guard stays as a placeholder so the snippet remains optional
reference code — the README already warns to add one.
…rrow RPC fallback

P1-2: Wrap the wiki-delete PostgREST filter id in encodeURIComponent so
non-numeric (e.g., uuid) ids with reserved chars don't break the URL.
P1-3: Coerce thrown non-Error values in the retry loop — 'throw 'msg''
used to crash the catch and skip the state-log append.
P2:   Narrow the upsert_thought RPC fallback trigger to the specific
      rpc/upsert_thought 404 signal, not any string with 'not found',
      so auth/permission errors surface instead of silently falling
      through to a direct insert. Also cap the LLM error body at 500
      chars to avoid leaking long provider diagnostics to logs.
…rereqs

Escape SUBJECT_NAME in autobiography frontmatter so names containing
':', quotes, or YAML-reserved chars don't break the frontmatter parse
the dashboard snippet does.

Expand the README's thought_edges prerequisite with column types and
the UNIQUE index needed for the ignore-duplicates edge-upsert header.
Also spell out what users observe when the schema is missing.
@alanshurafa
Copy link
Copy Markdown
Owner Author

Refreshing checks after markdownlint cleanup merged into fork main.

@alanshurafa alanshurafa reopened this Apr 22, 2026
@alanshurafa
Copy link
Copy Markdown
Owner Author

Refreshing checks after fork markdownlint workflow fix.

@alanshurafa alanshurafa reopened this Apr 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant