recipes: local-ollama-embeddings — upsert, reembed, content_fingerprint#252
Open
snapsynapse wants to merge 1 commit intoNateBJones-Projects:mainfrom
Open
recipes: local-ollama-embeddings — upsert, reembed, content_fingerprint#252snapsynapse wants to merge 1 commit intoNateBJones-Projects:mainfrom
snapsynapse wants to merge 1 commit intoNateBJones-Projects:mainfrom
Conversation
…ed-local.py - `--upsert`: ON CONFLICT(content_fingerprint) merge-duplicates via Prefer header - `--reembed`: fetch all existing rows and re-embed them (update embedding column only); useful after switching embedding models. Supports `--reembed-limit N`. - `--content_fingerprint` passthrough in JSONL input (already in read_thoughts_from_file) - Refactored `ingest_thought` to accept optional fingerprint + upsert params - Added `update_embedding(row_id, embedding)` and `fetch_all_rows()` helpers - `_supa_headers()` extracted to avoid header dict duplication - Input collection block skipped when `--reembed` is active (no stdin/file required) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds three production features to
embed-local.pythat make the local-embeddings recipe usable as a repeatable seed pipeline:--upsert— usesON CONFLICT (content_fingerprint)withresolution=merge-duplicatesvia the PostgRESTPreferheader. Requires aUNIQUEconstraint oncontent_fingerprint; no-ops gracefully if the constraint is absent (falls back to plain insert).--reembed+--reembed-limit N— fetches all existing rows (id+content), re-generates embeddings via Ollama, and PATCHes theembeddingcolumn. Useful after switching models (e.g. nomic-embed-text → mxbai-embed-large). Does not touchcontentormetadata.content_fingerprintpassthrough — JSONL inputs with acontent_fingerprintkey now forward the value to the insert/upsert call (was silently dropped before).New helpers
_supa_headers(prefer)— extracted from inline dicts to remove duplicationupdate_embedding(row_id, embedding)— PATCH for reembed pathfetch_all_rows()— GETid,contentordered bycreated_atInput collection change
When
--reembedis active, the script skips stdin/file/arg input collection entirely (no--fileor piped input required).Test plan
python embed-local.py --file thoughts.jsonl --upsert— confirm ON CONFLICT merge on second runpython embed-local.py --reembed --dry-run— confirm rows fetched, embeddings generated, no DB writespython embed-local.py --reembed --reembed-limit 5— confirm only 5 rows patchedpython embed-local.py --file plain.txt— confirm unchanged plain-insert path still workspython embed-local.py(no args, no stdin) — confirm still exits with clear error🤖 Generated with Claude Code