[recipes] Add wiki-synthesis — autobiography + email-thread wikis#17
[recipes] Add wiki-synthesis — autobiography + email-thread wikis#17alanshurafa wants to merge 7 commits intomainfrom
Conversation
…from thoughts Adds a new recipe that ships two Node scripts for wiki-style synthesis over the core `thoughts` table, plus optional Next.js dashboard snippets. - `scripts/synthesize-wiki.mjs` — topic-scoped synthesizer with a built-in `autobiography` mode that groups thoughts by year and asks an OpenAI-compatible Chat Completions endpoint to produce second-person biographical prose per year. Extend the catalogue to add more topics. - `scripts/backfill-gmail-wikis.mjs` — resume-safe per-thread wiki generator for Gmail-imported thoughts. Groups by `metadata.gmail.thread_id`, filters by word-count + message/atom thresholds, writes wiki thoughts with `derived_from` edges to their source atoms. Prefers an `upsert_thought` RPC when present, falls back to plain inserts. - `dashboard-snippets/` — optional Next.js components (Server Action + `/wiki` index + `/wiki/[slug]` detail) to copy into a dashboard. Generalized from ExoCortex origin: replaced the MCP edge-function client with direct PostgREST + service role, swapped the Anthropic-direct call for OpenAI-compatible Chat Completions, removed personal data references, introduced SUBJECT_NAME and SOURCE_TYPE_FILTER env knobs. Complements the in-flight entity-wiki recipe — this one does corpus-slice synthesis and only requires the core `thoughts` table. README documents the thought_edges / upsert_thought prerequisites so users know which OB1 layers they need before running the email-thread pipeline.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 181f48e289
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const thoughtId = active.id as number; | ||
| const newStatus = over.id as string; | ||
|
|
There was a problem hiding this comment.
Resolve status from drop container, not hovered card
handleDragEnd treats over.id as the new workflow status, but with SortableContext each card is also a droppable target, so over.id is often a thought ID rather than a column key. In those common drops, this sends values like "123" to /api/kanban/update, which fails VALID_STATUSES validation and reverts the move, making drag-and-drop unreliable unless the user drops in empty column space.
Useful? React with 👍 / 👎.
| UPDATE thoughts | ||
| SET status = 'new', status_updated_at = now() | ||
| WHERE metadata->>'type' IN ('task', 'idea') AND status IS NULL; |
There was a problem hiding this comment.
Backfill workflow status from
type column
The migration backfill filters on metadata->>'type', but workflow task classification is stored in the top-level type field (and even this schema's README uses WHERE type IN ('task', 'idea')). On existing databases this leaves pre-existing task/idea rows with status IS NULL, so they are excluded from status-filtered workflow queries until users run a manual corrective update.
Useful? React with 👍 / 👎.
…tion Wrap raw thought content in <entries>...</entries> delimiters and tell the system+user prompt explicitly to ignore instructions inside it. Captured thoughts are user data and can contain adversarial text that overrides the biographer task.
Wrap raw email-thread content in <thread>...</thread> delimiters and tell the system prompt to treat it as untrusted data. Email bodies arrive from external senders and may contain adversarial instructions that override the summarization task.
…idate year
Server Action was passing the host's entire process.env to the
synthesize-wiki child process, leaking every unrelated secret. Pass only
the OB1/LLM-related vars plus the minimal system env the child needs.
Also server-side-validate scope_year against /^(19|20)\d{2}$/ (P3).
Auth guard stays as a placeholder so the snippet remains optional
reference code — the README already warns to add one.
…rrow RPC fallback
P1-2: Wrap the wiki-delete PostgREST filter id in encodeURIComponent so
non-numeric (e.g., uuid) ids with reserved chars don't break the URL.
P1-3: Coerce thrown non-Error values in the retry loop — 'throw 'msg''
used to crash the catch and skip the state-log append.
P2: Narrow the upsert_thought RPC fallback trigger to the specific
rpc/upsert_thought 404 signal, not any string with 'not found',
so auth/permission errors surface instead of silently falling
through to a direct insert. Also cap the LLM error body at 500
chars to avoid leaking long provider diagnostics to logs.
…rereqs Escape SUBJECT_NAME in autobiography frontmatter so names containing ':', quotes, or YAML-reserved chars don't break the frontmatter parse the dashboard snippet does. Expand the README's thought_edges prerequisite with column types and the UNIQUE index needed for the ignore-duplicates edge-upsert header. Also spell out what users observe when the schema is missing.
|
Refreshing checks after markdownlint cleanup merged into fork main. |
|
Refreshing checks after fork markdownlint workflow fix. |
Summary
New recipe under
recipes/wiki-synthesis/that ports ExoCortex's wiki-synthesis work into OB1:scripts/synthesize-wiki.mjs— topic-scoped synthesizer with anautobiographymode that groups thoughts by year and asks an OpenAI-compatible Chat Completions endpoint for second-person biographical prose per year. Extensible catalogue (drop in more topics: career, travel, relationships, etc.).scripts/backfill-gmail-wikis.mjs— resume-safe per-thread wiki generator for Gmail-imported thoughts. Groups bymetadata.gmail.thread_id, filters by word-count + message/atom thresholds, writesgmail_wikithoughts withderived_fromedges back to source atoms. Prefers anupsert_thoughtRPC when present; plain insert fallback.dashboard-snippets/— optional Next.js Server-Action components for a/wikiindex +/wiki/[slug]detail view. Users copy into their own dashboard and wire in auth.How it differs from
entity-wikientity-wiki(separate PR, branchcontrib/alanshurafa/entity-wiki) synthesizes one page per entity and needs the entity-extraction schema.wiki-synthesis(this PR) synthesizes one page per corpus slice (year, topic, email thread) and only requires the corethoughtstable, plus optionalthought_edgesfor email-thread provenance.Both recipes are documented to cross-reference each other.
What it requires
thoughtstable.recipes/email-history-import/(or compatible importer that populatesmetadata.gmail.thread_id+metadata.gmail.gmail_id), plus apublic.thought_edgestable from the Knowledge Graph schema (upstream PR [schemas] Knowledge graph tables and extraction trigger #5).README calls these prerequisites out explicitly so users know which OB1 layers they need before running the pipeline.
Generalization notes (from the ExoCortex origin)
x-brain-key/open-brain-rest) with direct PostgREST + service role — matches theentity-wikirecipe's env-var contract (OPEN_BRAIN_URL,OPEN_BRAIN_SERVICE_KEY,LLM_BASE_URL,LLM_API_KEY,LLM_MODEL).claude-cliprovider branch frombackfill-gmail-wikis.mjs(that was a local-compute workaround, not appropriate for a shared recipe).SUBJECT_NAME(narrator voice),SOURCE_TYPE_FILTER(narrow autobiography corpus),WIKI_OUTPUT_DIR.Test plan
--dry-run --scope year=2024) and confirm the year bucketing works against corethoughts.output/wiki/autobiography-2024.md.backfill-gmail-wikis.mjsagainst my imported Gmail corpus and confirm eligibility counts match expectation.--limit=5), verify wiki thoughts land withsource_type='gmail_wiki'and thatthought_edgeshasderived_fromrows back to source atoms./wikiindex renders and the Server Action round-trips to the script.Pre-review fork PR — this is the intake PR for cross-AI review before any upstream PR to
NateBJones-Projects/OB1.