[schemas] Smart ingest pipeline tables#4
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 3abe49a057
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| UPDATE public.thoughts | ||
| SET metadata = jsonb_set( | ||
| coalesce(metadata, '{}'::jsonb), | ||
| '{evidence}', | ||
| v_current_evidence || jsonb_build_object( |
There was a problem hiding this comment.
Serialize evidence updates to avoid lost writes
append_thought_evidence reads metadata->'evidence' into v_current_evidence and later writes back v_current_evidence || ..., so concurrent calls on the same thought can clobber each other: if two workers append different evidence at the same time, the second write can overwrite the first and drop one entry. This breaks the function’s stated idempotent accumulation behavior for multi-item ingestion and should be fixed by locking the row (FOR UPDATE) or performing the append against the current row value inside a single UPDATE expression.
Useful? React with 👍 / 👎.
| GRANT EXECUTE ON FUNCTION public.append_thought_evidence(bigint, jsonb) | ||
| TO authenticated, anon, service_role; |
There was a problem hiding this comment.
Restrict SECURITY DEFINER RPC from anon/auth roles
This migration grants EXECUTE on a SECURITY DEFINER function to authenticated and anon, which allows callers to mutate public.thoughts through the function even when row-level access is intended to be service-role-only (as configured in docs/01-getting-started.md with a service-role policy on thoughts). In Supabase deployments where anon keys are client-visible, this exposes an authorization bypass for arbitrary thought_id updates and should be limited to service_role (or enforce caller ownership checks inside the function).
Useful? React with 👍 / 👎.
* [recipes] Add repo learning coach recipe * [recipes] Harden repo learning coach sync and reads
…es-Projects#146) * [dashboards] Add Workflow kanban board with drag-and-drop, mobile support, and MCP progress_task tool Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [dashboards] Mobile UX fixes: modal centering, landscape layout, touch drag-and-drop - Fix modal positioning with createPortal to escape DnD transform context - Add phone landscape CSS to hide sidebar and show mobile topbar - Switch to MouseSensor + TouchSensor for proper mobile drag delay - Add touchAction pan-y for scroll + drag coexistence - Add allowedDevOrigins for mobile dev testing - Add suppressHydrationWarning for browser extension compatibility Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [dashboards] Allow pinch-to-zoom on kanban cards Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [schemas] Add workflow status tracking columns for kanban board Adds status and status_updated_at columns to the thoughts table, enabling kanban-style workflow management for task and idea types. Includes migration SQL, backfill for existing thoughts, and partial index. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [dashboards] Add Workflow kanban board with drag-and-drop and mobile support Adds a full kanban board interface for managing task and idea thoughts: - Drag-and-drop between status columns (New/Planning/Active/Review/Done) - Touch-friendly with 200ms hold delay, pinch-to-zoom enabled - Collapsible columns with localStorage persistence - Inline edit modal for status, priority, type, and content - Dashboard summary widget showing active workflow items - Mobile-first responsive layout with full-screen edit on small screens - @dnd-kit for accessible drag-and-drop (mouse + touch sensors) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [dashboards] Add delete button to kanban card edit modal Adds a Delete button in the kanban card modal footer with a confirmation banner before permanently deleting the thought. Wires up a new /api/kanban/delete route and optimistic removal from the board. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * [dashboards] Make delete confirmation a separate popup dialog Replace the inline banner with a standalone centered dialog that overlays on top of the edit modal, with clear title, description, and Cancel/Delete buttons. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [dashboards] Fix deleteThought parsing empty response body The REST API returns an empty body on DELETE, but apiFetch always called res.json() causing a parse error. Inline the fetch so it skips JSON parsing. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Ivan <ivan@openbrain.dev> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
…ng (NateBJones-Projects#141) Syncs Claude Code's local memory saves to Open Brain via mcp__open-brain__capture_thought so memories are accessible from ChatGPT, Claude Desktop, Codex, and any MCP-connected client. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fix skill divergence (NateBJones-Projects#135) * [recipes] Update life-engine schema: user_id TEXT, add weekly_review/cron_state types - Changed user_id from UUID to TEXT across all 5 tables (supports Telegram chat_id as identifier without UUID padding hacks) - Added weekly_review and cron_state to briefing_type check constraint Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Clean up Life Engine: add state table, simplify loop timing, fix skill divergence - Add life_engine_state key-value table for runtime state (cron job ID, sleep schedule) instead of overloading briefing log with cron_state type - Remove cron_state from briefing_type CHECK constraint - Simplify Dynamic Loop Timing from 6 tiers to 4 (15m/30m/60m/one-shot) - Replace duplicate embedded skill in README with pointer to life-engine-skill.md - Add user_responded update logic to Rule 7 for self-improvement engagement tracking - Add timezone note to skill time windows - Fix platform references to include Discord alongside Telegram - Add RLS comment explaining why no row policies are needed - Update metadata.json date Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Harden Life Engine permissions: lead with settings.json allowlist, scope MCP tools - Restructure Step 6 to recommend settings.json allowlist as default (Option A) - Replace broad mcp__open-brain__* and mcp__supabase__* wildcards with specific tool names (search_thoughts, list_thoughts, execute_sql, etc.) - Include CronCreate and CronDelete in the default allowlist - Demote --dangerously-skip-permissions to Option D (testing only) - Update Quick Setup and Step 7 launch commands to use settings.json approach - Addresses HIGH finding from security audit Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Add rain forecast to Life Engine morning briefing via Open-Meteo - Add Weather section to skill with Open-Meteo API call (free, no API key) - Include rain windows with time ranges and probability in morning briefing - Default coordinates: Portland, OR (45.52, -122.68), configurable via life_engine_state - Only show rain line when precipitation_probability >= 30% - Update schema comment to document latitude/longitude state keys Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Add Daily Capture, portable customizations, and manual sync rule to Life Engine Backport portable customizations from installed SKILL.md into the recipe: date anchor, database note, user identity, valid briefing types, proactive chat_id, rules 9-14. Add Daily Capture prompt in evening window with capture_thought integration. Add Rule 14 requiring manual sync between recipe and installed skill files. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Fix hallucinated column name: briefings table uses 'content' not 'summary' Add explicit column reference note to prevent the LLM from hallucinating a 'summary' column on life_engine_briefings — the correct column is 'content'. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Address PR review: Discord support, migration steps, permission docs Fixes all issues from PR NateBJones-Projects#135 review: - P1: Add Bash(date/curl) and capture_thought to README allowlist examples - P1: Make channel event handling platform-agnostic (Telegram + Discord) in skill Rules 7, 10, 11 and Channel Tools section - P1: Add upgrade migration steps to schema.sql for user_id UUID→TEXT - P2: Add CHECK constraint on delivered_via ('telegram', 'discord') - P2: Add single-user assumption comment on life_engine_state table - Bump version to 1.1.0, update date to 2026-04-01 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [recipes] Broaden Bash permission to Bash(*) — scoped patterns are fragile Scoped Bash patterns like Bash(date *) and Bash(curl -s *api.open-meteo.com*) break when the LLM varies its exact command syntax between runs, causing silent permission blocks during unattended operation. Replace with Bash(*) since Life Engine only uses benign read-only commands (date, curl) and Rule 11 prevents dangerous execution from external triggers. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…teBJones-Projects#125) Replaces the empty stub with a working zero-infrastructure approach using Claude Code scheduled tasks + Open Brain MCP + Gmail MCP. Preserves the Edge Function approach as a planned future option. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…es-Projects#37) * [recipes] Vercel + Neon + Telegram alternative architecture Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * [fix] Replace local MCP pattern with custom connectors (PR review feedback) Replace claude_desktop_config.json + mcp-remote bridge instructions with Claude Desktop custom connectors UI approach in both Step 8 and the Troubleshooting section, aligning with CONTRIBUTING.md Rule #14. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
…BJones-Projects#171) * [recipes] ChatGPT import v2: multi-thought knowledge extraction Replace 1-3 sentence summarization with structured knowledge extraction that produces 2-5 typed thoughts per conversation (decision, preference, learning, context, brainstorm, reference) with enriched metadata. Key changes to import-chatgpt.py: - Branch resolution via current_node parent-pointer walk - Content type dispatch for 14 export message formats (voice, reasoning, web search, code) - Signal-based filtering replaces regex title matching - Session boundary detection for multi-day conversations - Semantic deduplication via match_thoughts RPC - Re-import handling with update_time/content_hash detection - Embed thought content, not [ChatGPT: title] prefix - --store-conversations for optional conversation history with pyramid summaries - --focus flag with presets (tech, strategy, personal, creative) and custom text - --openrouter-model flag for model selection - --max-words flag to skip oversized conversations (default: 50000) - Robust JSON parsing for non-OpenAI models (Anthropic, Ollama) - Accurate progress display with percentage and skip counts New files: - chatgpt_parser.py: parsing, content dispatch, filtering, session detection - schema.sql: chatgpt_conversations table with pyramid summaries and indexes All existing CLI flags preserved (--dry-run, --model ollama, --after/--before, --limit, --report, --verbose, --raw, --ingest-endpoint). * [recipes] Fix ChatGPT import filtering defaults --------- Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
NateBJones-Projects#160) * [recipes] Local Ollama embeddings — zero-cost alternative to OpenRouter Generate embeddings locally via Ollama and insert into Supabase. Keeps the existing OB1 architecture, only swaps the embedding provider. Five models tested including gte-qwen2-1.5b (1536-dim) which is drop-in compatible with the default Open Brain schema. Includes quality benchmarks comparing discrimination power across all five models. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix markdown lint errors in README Add blank lines around fenced code blocks (MD031) and merge consecutive blockquotes (MD028). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [recipes] Fix local Ollama env loading docs --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
…s-Projects#150) * [docs] Fix MD028 blank line between blockquotes in getting-started guide Removes blank line between WARNING and IMPORTANT blockquotes that was failing markdownlint across all PRs. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix claudeception recipe: convert multi-line YAML descriptions to single-line Multi-line descriptions (description: |) break agent routing silently. Nate's March 2026 Skills Standard requires single-line YAML descriptions for reliable semantic matching. Fixed 3 instances: the recipe's own description and 2 template examples. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [recipes] Clean up Claudeception docs formatting --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
…NateBJones-Projects#148) * fix(professional-crm): remove Accept header patch causing SSE reconnect loop The Accept: text/event-stream header patch forced StreamableHTTPTransport into SSE mode on every request. Since Supabase edge functions are stateless, the SSE stream terminates immediately after each response — causing the MCP client to reconnect every ~2 seconds (~43k invocations/day). StreamableHTTPTransport is request/response by design. Removing the patch lets it respond with plain JSON, eliminating the reconnect loop entirely. * fix(professional-crm): force JSON-only Accept header to prevent SSE reconnect loop Removing text/event-stream from the Accept header before it reaches StreamableHTTPTransport prevents it from opening SSE streams. MCP clients send Accept: application/json, text/event-stream per spec -- this is what triggers SSE mode even without the original workaround. JSON-only responses close cleanly, eliminating the boot/shutdown cycle.
…ateBJones-Projects#139) * recipes: add adaptive capture classification with confidence gating * recipes: address review — fix author, OB1 types, add TypeScript implementation * recipes: incorporate GitHub edits to README, classifier prompt, and metadata * [recipes] Tighten adaptive capture setup and threshold updates --------- Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
…ateBJones-Projects#133) * Add update_professional_contact tool to CRM extension Adds the ability to update existing contact fields (name, company, title, email, phone, tags, notes, follow_up_date, etc.) which was proposed in NateBJones-Projects#93 but never implemented. Only provided fields are updated, and the existing updated_at trigger handles timestamping. * Allow clearing follow_up_date by passing null or empty string Fixes the case where a follow-up date, once set, could never be cleared — leaving contacts permanently stuck in get_follow_ups_due. * [extensions] Document contact update tool --------- Co-authored-by: Matt Hallett <matthallett@gmail.com> Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
…es-Projects#161) * Fix pre-existing markdownlint errors across 15 files Add blank lines around headings (MD022), fenced code blocks (MD031), and between adjacent blockquotes (MD028). Fix broken link fragment (MD051) and remove extra blank line (MD012). No content changes. These errors were blocking CI on all open PRs since the lint check runs repo-wide. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [docs] Preserve README links during markdown cleanup --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
…raphics (NateBJones-Projects#85) * [recipes] Infographic Generator: turn research docs into visual infographics Second recipe from @jaredirish. Part of the Open Brain Flywheel (capture-process-visualize loop, see Issue NateBJones-Projects#84). Takes any markdown doc or Open Brain thought cluster and generates professional infographic images via Gemini's free-tier API. Auto-chunks content, writes verbose prompts (300+ words each), generates PNGs with specific colors/layout/typography. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [recipes] Fix broken relative links in infographic-generator README ../brain-dump-processor/ → ../panning-for-gold/ ../auto-capture-protocol/ → ../auto-capture/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [recipes] Address review feedback on infographic generator - Sync generate.py with working local version (cleaner error handling, fix --redo display counter bug) - Fix auto-capture link: directory doesn't exist until PR NateBJones-Projects#42 merges, so link to the PR instead of a non-existent directory Note: part.as_image() and gemini-2.5-flash-image are both valid per the official google-genai SDK docs. Reviewer concerns on those were based on outdated information. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * [recipes] Fix infographic redo progress output --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
* [recipes] Add OB-Graph knowledge graph layer Adds graph database functionality for Open Brain using PostgreSQL nodes + edges with recursive CTE traversal. Includes schema, MCP server with 10 tools, and setup documentation. https://claude.ai/code/session_015Z8wCeokTMTdrVMthqzGKJ * [recipes] Clarify OB-Graph deployment setup --------- Co-authored-by: Claude <noreply@anthropic.com>
* [docs] Fix Cursor MCP connection — use native url field, not mcp-remote mcp-remote@latest now attempts OAuth client registration before sending custom headers, which breaks against Open Brain's simple key-based auth. Cursor supports remote MCP servers natively via the url field, so mcp-remote is unnecessary. Changes: - Add dedicated Cursor section to getting-started guide (7.5) and remote-mcp primitive with native url config - Update mcp-remote examples to pass key via ?key= query parameter instead of --header to avoid OAuth discovery issues - Clarify x-brain-key (core) vs x-access-key (extensions) in troubleshooting guides Made-with: Cursor * [primitives] Bring remote MCP docs in line with repo format --------- Co-authored-by: Jonathan Edwards <justfinethanku@gmail.com>
* [skills] Add weekly signal diff skill pack * [skills] Fix markdownlint numbering in weekly signal diff
…rojects#181) * [recipes] Add Bring Your Own Context recipe * [recipes] Fix markdownlint regression in activation README
* [repo] Sweep fix-now backlog issues * [docs] Fix setup-guide markdownlint regression
SECURITY DEFINER function was granted to authenticated/anon, allowing RLS bypass. Now restricted to service_role only. Added FOR UPDATE to prevent concurrent evidence appends from losing writes. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Drop the stale reference to `schemas/enhanced-thoughts/` (deleted on this branch and not actually used by the SQL — the function only touches `thoughts.id` and `thoughts.metadata`). Also update Expected Outcome to reflect the service-role-only grant on `append_thought_evidence` so users don't re-grant it to anon/authenticated by accident. Why: README claimed a prerequisite that 404s on the repo and mis-stated the RPC's trust boundary. Both were latent user-footguns.
Add nullable `user_id uuid` to `ingestion_jobs` and `ingestion_items` via idempotent `ALTER TABLE ... ADD COLUMN IF NOT EXISTS`. A DO block conditionally adds FKs to `auth.users(id) ON DELETE CASCADE` only when Supabase's `auth` schema and `users` table exist, so the migration stays safe on non-Supabase Postgres. Why: without user_id, multi-tenant deployments leak ingestion history across users. Nullable keeps single-tenant stock OB1 working with no data migration and lets RLS policies (added separately) key off auth.uid() = user_id once populated.
Turn on row level security for `ingestion_jobs` and `ingestion_items`, add a `service_role ALL` policy on each (so worker writes still flow), and — conditionally, only when `auth.uid()` exists — add an `authenticated SELECT` policy scoped to `user_id = auth.uid()`. Policies are wrapped in DROP POLICY IF EXISTS / CREATE so the file is still idempotent on re-run. Why: the grant block was already service-role-only, but without RLS there was no backstop if Supabase's schema-level defaults quietly granted `USAGE`/`SELECT` to `anon` or `authenticated`. RLS closes that door. Giving authenticated users a SELECT scoped to their own rows matches the pattern used by the rest of the Open Brain extensions and is a no-op until someone populates `user_id`.
Add partial indexes keyed on `created_at` for rows in the active
lifecycle (`status = 'pending'` on jobs; `status IN ('pending','ready')`
on items). Both use `CREATE INDEX IF NOT EXISTS` so re-running the
migration is a no-op.
Why: the worker polls for the next pending job and for ready items
repeatedly. Without a partial index, every poll becomes a seq scan
against a table whose historical tail of completed rows grows forever.
Partial indexes stay tiny (only live queue rows) and shrink to near
zero when the queue drains.
Add a Job Claim Semantics section to the README that states the contract explicitly: claim logic lives in the companion Edge Function (`integrations/smart-ingest/`), and any worker that claims a row MUST use `FOR UPDATE SKIP LOCKED`. Include a canonical UPDATE-with-sub-SELECT pattern that pairs with the new partial indexes. Also sync Expected Outcome with the new user_id columns, partial indexes, and RLS policies so the README matches the schema it describes. Why: the schema file is deliberately minimal (no claim RPC), so without this note a downstream author could wire up a plain SELECT- then-UPDATE worker and silently double-process the queue. Putting the contract in the schema README — next to the tables it operates on — keeps the DB layer's requirements discoverable even when the companion Edge Function lives in a separate folder.
f7b4fd7 to
f08c903
Compare
Summary
ingestion_jobsandingestion_itemstables for tracking the extract-deduplicate-execute lifecycle of bulk text ingestionappend_thought_evidenceRPC for idempotent evidence accumulation on thoughtsWhat's included
ingestion_jobsingestion_itemsingestion_items_job_idxappend_thought_evidence()Design decisions
CREATE TABLE IF NOT EXISTSandCREATE OR REPLACE FUNCTIONpatterns — safe to run multiple timespending → extracting → dry_run_complete → executing → complete, supporting human-in-the-loop review before mutationinput_hashunique constraint on jobs prevents duplicate processing of the same textappend_thought_evidenceuses SHA256 of(source_label + excerpt + thought_id)to prevent duplicate evidence entriesON DELETE CASCADEDependencies
thoughtstable columnsTest plan
schema.sqlinto Supabase SQL Editor and run — no errorsingestion_jobsandingestion_itemstables appear in Table Editorappend_thought_evidencefunction appears in Database > FunctionsSELECT count(*) FROM ingestion_jobs;— returns 0🤖 Generated with Claude Code