Rebuild Gmail job-email pipeline with deterministic gating and guarded DB updates#449
Rebuild Gmail job-email pipeline with deterministic gating and guarded DB updates#449
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 773be3b0a8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| } | ||
|
|
||
| async guardedUpdate({ extraction, match, emailInput, emailDate }) { | ||
| if (extraction.confidence >= 0.85 && match.confidence >= 0.9) { |
There was a problem hiding this comment.
Permit new records when no application match exists
The write gate in guardedUpdate() requires match.confidence >= 0.9 before entering either update or create paths, but matchApplication() returns { applicationId: null, confidence: 0 } when no prior record matches. That makes the new-record branch effectively unreachable, so genuinely new job-application emails are always sent to review and never inserted automatically, even when extraction confidence is high.
Useful? React with 👍 / 👎.
| if (extraction.confidence >= 0.85 && match.confidence >= 0.9) { | ||
| const updatePayload = { | ||
| role: extraction.job_title || undefined, | ||
| company: extraction.company || "Unknown", |
There was a problem hiding this comment.
Keep existing company instead of overwriting with "Unknown"
For matched updates, company is always set to extraction.company || "Unknown". In common follow-up emails where the extractor omits company but matching succeeds via threadId/title, this overwrites a valid stored company with "Unknown", which is data loss and can trigger unique-index conflicts because company is unique in ApplicationSchema.
Useful? React with 👍 / 👎.
Motivation
watch/history IDs and Pub/Sub payloads so the system processes only mailbox deltas.skill-creatorguidance to structure a two-stage AI interaction (classification → extraction) with strict JSON schemas and deterministic guards.Description
back/src/services/GmailApi.jsto addstartWatch(...),fetchHistory(...),decodePubSubMessage(...), andextractMessageIdsFromHistory(...)helpers and centralized header construction.back/src/services/GeminiApi.js, introducingCLASSIFICATION_SCHEMAandEXTRACTION_SCHEMA, arequestJson(...)helper, region rotation, and raw-AI output logging for debuggability.back/src/services/emailParser.jsthat includes deterministic pre-filter scoring (positive job signals and negative commerce signals), an AI classification gate (confidence >= 0.8), AI extraction, deterministic DB matching order (threadId → exact company → exact title → fuzzy title similarity), and a guarded update that only writes whenextraction.confidence >= 0.85andmatch.confidence >= 0.9, otherwise queuing the item for review.back/src/Tracker.jsfor/api/applications/scanto accept an optionallastHistoryIdand return pipeline results (processed items,lastHistoryId, andreviewQueue) instead of immediately returning a full application query.Testing
yarn run build-jswhich completed successfully (Babel compiled files).Codex Task