[integrations] Chrome extension — Gemini bulk history sync (Phase B/C) by alanshurafa · Pull Request #218 · NateBJones-Projects/OB1

alanshurafa · 2026-04-21T21:17:28Z

Stacks on #214

This PR stacks on #214 (contrib/alanshurafa/chrome-capture-extension). Because #214 is not yet merged, this PR's diff against main includes the base extension commits. Suggested merge order: merge #214 first, then this branch will rebase cleanly on origin/main for a focused second-PR diff.

Summary

Adds bulk history sync for Gemini to the Chrome capture extension introduced in #214. This covers two phases:

Phase B — History backfill. Uses chrome.debugger + Gemini's internal batchexecute RPC to pull historical conversations. Parser tolerates Google's framed protocol variations and the anti-hijack prefix.
Phase C — Sync All UI + orchestrator. A state machine (idle → running → paused → done) with Sync / Resume / Cancel controls, batched progress, interruption recovery, and per-tab health checks.

Auto-sync defaults to OFF, matching the Claude and ChatGPT convention from the base extension. Phase A (ambient live-turn capture) was intentionally dropped — the maintainer previously rejected it as unsustainable, so this contribution skips it entirely.

Naming alignment

All identifiers renamed from the original ExoCortex scheme to match the OB1 base extension:

ExoGemini* → OBGemini*
Storage keys exo_gemini_* → ob_gemini_*

Permissions

Adds debugger and scripting to manifest.json. Both are justified in the README — debugger for capturing batchexecute responses, scripting for injecting the session-token probe. Scope is limited to Gemini origins.

Tests

50 passing (15 new extractor tests covering framed-response edge cases + 35 state-machine tests carried over from the base).

Review trail

Cross-AI reviewed across three rounds (Codex exec + Claude reviewers):

2 P1 fixed: healthcheck race vs. debugger attach; resumeIfInterrupted reentrance.
2 P1 declined with rationale: debugger-attach-on-all-Gemini-tabs is intentional (documented in README); MV3 service-worker termination recovery is covered by existing STALE_HEARTBEAT path.
6 P2 fixed. Round 2 surfaced two self-inflicted regressions from round 1 — both fixed in round 3.

Known follow-ups (non-blocking)

parseFramedResponse micro-perf pass.
optional_host_permissions tightening so Gemini scope is request-time, not install-time.

Chrome MV3 extension that captures AI conversations into Open Brain via the REST API. First-run config screen collects API URL and key (stored in chrome.storage.local). All ExoCortex-specific hardcoded Supabase project URLs removed — extension is fully configurable. Runtime host permissions model documented.

…ted as synced

…policy

…privacy)

…e unused exports

…licit flag not identity check

…lider

…missions and UI

…nal PR URLs

….debugger Adds the Phase B foundation for Gemini bulk backfill. Gemini exposes no public conversation API, so the extension attaches chrome.debugger to gemini.google.com tabs and watches for the one internal RPC that Gemini itself uses to load conversation history (batchexecute rpcids=hNvQHb). On loadingFinished the service worker reads the response body via the debugger protocol, parses Gemini's framed positional-array envelope, and yields one normalized turn per user/assistant exchange. All turns are funneled through the existing processCaptureRequest pipeline so they inherit retry queue, sensitivity filter, fingerprint dedup, and session metrics — no parallel /ingest path. This commit is debugger infrastructure only. Phase C (the Sync All orchestrator that drives per-conversation navigation) lands in a follow-up commit. Live StreamGenerate / ambient capture is NOT ported — the extension's public release deliberately dropped ambient capture, and this port preserves that policy. Manifest adds the minimum permissions needed: `debugger` (attach only to gemini.google.com, observe one RPC pattern) and `scripting` (for the Phase C sidebar enumerator). Version bumps 0.4.0 → 0.5.0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds the Phase C orchestrator that drives full-history backfill. The state machine lives in a pure helper (lib/gemini-sync-state.js) so it can be unit-tested under node --test without a Chrome stub; the orchestrator (background/gemini-sync.js) owns all chrome.* calls. Flow: 1. Enumerate the Gemini sidebar via chrome.scripting.executeScript (scrolls the list until IDs stop growing). 2. Open one background sync tab, drive it through each conversation. 3. Per conversation: register a waiter keyed by the conversation ID, navigate the tab, wait for Phase B to notify via notifyHistoryCaptured(id, totals). 4. Fingerprint dedup at the ingest layer guarantees re-runs are safe (duplicate turns return duplicate_fingerprint / existing). Resilience: - Resumable across MV3 SW restarts via chrome.storage.local state. - User-cancelable at any time; Sync All button relabels to Resume Sync when a run was paused mid-flight. - Tab-health check before every navigation catches Google bot challenges (CAPTCHA / login prompts that redirect off Gemini) and transitions gracefully to a CANCELED paused state instead of burning through the queue with silent timeouts. Anti-bot throttle: - 4–12 s jittered delay between conversations (sub-millisecond precision so whole-second clusters don't fingerprint as a bot). - 20–35 s "reading pause" every 10 conversations to break cadence. Tuned to stay under Google's challenge threshold (earlier uniform 4 s cadence tripped the challenge around conversation 21). Incremental path: - syncIncremental() filters the sidebar against lifetime everSyncedIds and navigates only the delta. Capped at 20 per run so scheduled use stays quiet. - Auto-sync (4-hour cadence, opt-in) drives incremental sync. Tests (35 cases) cover state transitions, pendingIds deduplication and cap enforcement, completion/failure bookkeeping, progress summary, and the waiter registry's resolve/abort/abortAll semantics. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Loads the Phase B debugger and Phase C orchestrator via importScripts, adds GEMINI_SYNC_* message handlers (start, cancel, resume, incremental, status, auto-sync toggle), and drives the Gemini auto-sync alarm on install/startup. Exposes processCaptureRequest on the service-worker global so the Gemini debugger module can funnel history turns through the same ingest pipeline as manual capture. Classic-script function declarations are already global in the SW scope, but the explicit assignment pins the cross-module contract so it doesn't quietly break if the function is ever rewrapped. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Replaces the "Gemini bulk sync not supported" hint with the full Sync All History / Sync New / Cancel button group plus a 4-hour auto-sync toggle. The button auto-labels as "Resume Sync" when a run was paused mid-flight (Google bot challenge, SW restart) so the user can pick up where the state machine left off without re-enumerating. Progress surfaces via a 2-second polling loop that reads GEMINI_SYNC_STATUS, renders percent complete and the captured/dedup counters, and stops the poll as soon as the state machine leaves enumerating/syncing. Paused state shows the failure reason and — when the reason looks like a CAPTCHA — prompts the user to solve it first and then click Resume. The existing shared sync-progress-area (used by Claude/ChatGPT full sync) remains in place unchanged; Gemini's progress lives in its own gemini-sync-progress block so the two flows don't clobber each other. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

README: adds the "Gemini bulk history sync (Phase B/C)" section covering how the debugger-based capture works end-to-end, the "Debugging this browser" banner users will see while syncing, the anti-bot throttling strategy, and how to pause the flow (toggle Gemini off or dismiss the debugger banner). Updates the Supported Sites table row for Gemini, the Usage paragraph, and the Chrome Web Store permission justifications for `debugger` and `scripting`. metadata.json: bumps version 1.0.0 → 1.1.0, adds the `gemini-bulk-sync` tag, updates the `updated` date, and drops the `_todo` field so the current (stricter) metadata schema validates. The TODO it referenced (PR NateBJones-Projects#201 slug) is still explained in the README alongside the prerequisite link. License remains FSL-1.1-MIT. No new runtime dependencies, no binary blobs, no telemetry or third-party hosts. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… false-negatives on debugger attach race On fresh sync, ensureSyncTab creates the Gemini tab synchronously but Phase B's chrome.debugger.attach runs async via chrome.tabs.onUpdated. mainLoop's first iteration called checkSyncTabHealthy immediately and could false-flag the run as paused with "debugger not attached" before attach won the race. Move attach awareness entirely into driveConversation (which already does it via waitForDebuggerAttach with a tolerant 2s budget), so the health check focuses on the real CAPTCHA/navigation-away signal. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…meIfInterrupted resumeIfInterrupted() is called at module load and was setting syncInFlight=true only AFTER an await loadState(). A popup-triggered startSync/resumeSync or alarm-triggered syncIncremental arriving during that async gap would observe syncInFlight=false, slip past the guard, and double-enter mainLoop against the same persisted queue. Claim the lock synchronously at the function top before any await. Release it explicitly when we short-circuit out (no-op / stale-reset paths); the happy path hands off to the existing finally block that already clears the lock after mainLoop completes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…etion totals double-count Two pure-helper fixes with test coverage: 1. waiter register() silently overwrote an existing slot, so a reentrant driveConversation or stale slot that outlived its setTimeout left the old promise hanging until abortAll. Worse, the old timeout's abort-by-id call could then reject the NEW waiter. Now reject the prior waiter with a clear reason before replacing. 2. recordCompletion added the result totals every call even when the id was already in completedIds. A duplicate notify (retry, late-resolve, or re-sync of the same conversation) inflated captured/dedup/turn counts. Gate the totals fold on first-time completion only. Adds two tests (36, 37) that guard both behaviors. Full suite 37/37 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… alarm + popup Four related fixes across the Phase B/C surface: 1. gemini-debugger.js — reject base64Encoded response bodies (the hNvQHb RPC is always text; base64 means wrong content type or URL misidentification) and cap parseable bodies at 8 MB so a malformed payload can't OOM the service worker. 2. gemini-debugger.js — on empty/malformed extractor output, recover the conversation id from the sync tab's /app/<id> URL and resolve the waiter with zero-result counts. Previously the orchestrator blocked on the full 15s capture timeout even when the response was visibly empty. 3. service-worker.js — Gemini auto-sync alarm now reads status first and skips syncIncremental when the prior run is paused on a Google challenge (state=canceled with pendingIds). Prevents alarm-driven re-runs from re-triggering the CAPTCHA until the user resumes. 4. popup.js — resume-vs-start click dispatch keys off a data-mode attribute set by renderGeminiProgress instead of string-matching the button text. Robust to future copy or localization changes. Tests 37/37 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…execute parsing Adds 13 tests that exercise extractGeminiHistory against synthesized batchexecute payloads. Guards the shape layers codex called out as under-tested: - XSSI prefix handling and whitespace tolerance - empty / non-history / non-wrb.fr frames return null cleanly - single-turn and multi-turn payloads decode in historyOrder - turns with missing ids / empty prompts drop without aborting the batch - historic timestamp decoding from [seconds, nanos] pair - parseAdaptive length-prefix drift tolerance (+/- 5 bytes) - fuzz: extractor never throws on 7 varieties of garbage input Test total now 50 (was 37). All pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… + resumeIfInterrupted lock leak Round 2 review caught two bugs I introduced in the round 1 fix batch. 1. gemini-debugger.js — the empty-payload notifyHistoryCaptured (captured:0, total:0) path marked the conversation as completed and added it to everSyncedIds, silently skipping it forever on future incremental syncs. If the payload was just a transient parse failure or future Gemini format change, the user loses that conversation. Revert to letting the orchestrator's 15s capture timeout fire naturally, which routes the id to failedIds instead — user can retry via Sync All after an extractor fix ships. 2. gemini-sync.js — resumeIfInterrupted's finally block checked the local record.state (still SYNCING/ENUMERATING) to decide whether to release syncInFlight. The stale-heartbeat branch updated persisted state but not the local copy, so the finally wrongly assumed we were handing off to the happy path and never cleared the lock. Result: after one stale-interrupted detection, every subsequent startSync, resumeSync, and syncIncremental returned 'sync already running' until the SW terminated. Replace the state-based heuristic with an explicit handedOffToMainLoop flag set only when we actually proceed. Tests 50/50 still pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

alanshurafa · 2026-04-22T17:21:49Z

Refreshing upstream checks after fork-side readiness cleanup.

alanshurafa and others added 24 commits April 17, 2026 23:39

[integrations] Fix REVIEW-CODEX-P1-1: failed ingests no longer persis…

400f5c1

…ted as synced

[integrations] Fix REVIEW-CODEX-P1-2: https + loopback-only endpoint …

c6e754f

…policy

[integrations] Fix BLOCKER: API key/endpoint moved to local storage (…

cfd1798

…privacy)

[integrations] Fix REVIEW-CODEX-P2: remove dead Capture Mode toggle

4bcf3ab

[integrations] Fix REVIEW-CODEX-P3: sensitivity log consistency, prun…

9de1b96

…e unused exports

[integrations] Fix WARNING: onInstalled auto-open on install only

5d4f2f9

[integrations] Fix WARNING: processingFingerprints Set leak

841e28d

[integrations] Document extractor fragility and known limitations

8fe78fe

[integrations] Fix REVIEW-CODEX-2-P2: local-storage fallback uses exp…

5ae9925

…licit flag not identity check

[integrations] Fix REVIEW-CODEX-2-P2: remove dead minResponseLength s…

125b846

…lider

[integrations] Fix REVIEW-CODEX-2-P3: README matches current host per…

60e0159

…missions and UI

[integrations] Fix CI Rule 13: convert broken relative links to exter…

fa80269

…nal PR URLs

github-actions Bot added the integration Contribution: MCP extension or capture source label Apr 21, 2026

alanshurafa closed this Apr 22, 2026

alanshurafa reopened this Apr 22, 2026

[docs] Fix pre-existing markdownlint errors across 8 files

02cd24e

github-actions Bot added the recipe Contribution: step-by-step recipe label Apr 22, 2026

github-actions Bot added the schema Contribution: database extension label Apr 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[integrations] Chrome extension — Gemini bulk history sync (Phase B/C)#218

[integrations] Chrome extension — Gemini bulk history sync (Phase B/C)#218
alanshurafa wants to merge 25 commits intoNateBJones-Projects:mainfrom
alanshurafa:contrib/alanshurafa/chrome-capture-gemini-history

alanshurafa commented Apr 21, 2026

Uh oh!

alanshurafa commented Apr 22, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant