Skip to content

fix: v3.5.7 translator + bridge hardening (6 issues from deep audit)#94

Merged
heznpc merged 1 commit intomainfrom
fix/v3.5.7-translator-hardening
Apr 30, 2026
Merged

fix: v3.5.7 translator + bridge hardening (6 issues from deep audit)#94
heznpc merged 1 commit intomainfrom
fix/v3.5.7-translator-hardening

Conversation

@heznpc
Copy link
Copy Markdown
Owner

@heznpc heznpc commented Apr 30, 2026

Summary

Second-pass deep audit on the bridge layer (translator.js 778 lines + page-bridge.js 251 lines + puter.js) found six real issues. The previous content.js-focused audit (#92) didn't cover this surface. Three are correctness bugs at current scale; the rest add resilience as external deps shift.

# Severity Issue Fix
1 🔴 CRITICAL Verify-queue tail items leak (translator.js:408) Extract _kickVerifyQueue + self-restart from .finally
2 🟠 HIGH IndexedDB cache poisoning — no shape validation _isValidTranslation rejects HTML / length>10× / >95% ASCII for non-Latin
3 🟠 HIGH Bridge-inject single-shot kills tutor _injectPageBridgeWithRetry — 2 retries, exp backoff
4 🟠 HIGH Hardcoded claude-sonnet-4-6 no fallback _puterChat wraps all calls with deprecation chain
5 🟡 MEDIUM Lang-switch writes stale-lang text into new page _langGeneration stamp + re-check after Gemini await
6 🟡 MEDIUM _cacheTranslation returned before store.put committed Resolve on tx.oncomplete

Why now

The user's concern was: AI/Skilljar/Anthropic are all moving fast and we have no production telemetry. Issues #2, #3, #4 are specifically resilience plays:

Verification (local)

  • Tests: 309/309 pass
  • Lint / format / selector health / dicts / bg-sync / glossary / validate — all green
  • Firefox build / bundle build — both pass; bundle 113.9 KB (no measurable size change)

Test plan

  • CI: validate + build + test green
  • Manual on a Skilljar lesson: open a long lesson with Korean active, switch to Japanese mid-translation, confirm no Korean text leaks into Japanese page
  • Manual: throttle network in DevTools (offline once during init), confirm bridge retries successfully and tutor works without reload
  • Manual: trigger a verify storm (translate a 200+ string lesson), confirm progress bar clears, all elements get verified, no spinners stuck

Known follow-ups (NOT in this PR)

  • Nonce-exposure window during bridge script-tag injection (HIGH security/defense-in-depth, theoretical attack, needs careful protocol redesign)
  • YouTube subtitle handler retry-timer leak on rapid lang toggle (separate file, separate review)
  • IDB schema-version migration story (no fields needed yet; future-work)

🤖 Generated with Claude Code

A second-pass audit on translator.js (778 lines) and the page-bridge
protocol surfaced six real issues. None of these would have been caught
by the previous content.js-focused audit. Three are correctness bugs
visible to users at the current scale; the rest add resilience as
external dependencies (Puter.js, model names, Anthropic deprecation
windows) shift under us.

CRITICAL — Verify-queue tail-item race
  Items pushed between `_runVerifyQueue`'s while-loop exit and the
  `.finally()` clearing `_verifyLock` got queued but no new run was
  scheduled — on a quiet page they sat un-verified forever. Extracted
  `_kickVerifyQueue()` and made `.finally()` self-restart if items
  arrived during teardown. Also unified the two duplicate lock-create
  sites (queueGeminiVerify and BRIDGE_READY handler) onto the helper.

HIGH — IndexedDB cache poisoning
  `_cacheTranslation` wrote whatever GT/Gemini returned to disk and
  served it for 30 days. A single corrupted response or transient
  proxy error page poisoned the cache. Added `_isValidTranslation`
  that rejects HTML tags, length ratios over 10×, and >95% ASCII for
  non-Latin target languages (typical refusal/error string). Skipped
  payloads silently retry on the next page load.

HIGH — Bridge-injection retry
  `script.onerror` and bridge timeout used to kill AI features for the
  whole tab session — one CDN hiccup, one CSP transient, dead tutor
  until reload. New `_injectPageBridgeWithRetry` does up to 2 retries
  with exponential backoff (500/1000/2000 ms). The
  `skillbridge:bridgeunavailable` banner now only fires after the
  retry budget is exhausted, not on the first failure.

HIGH — Model-name fallback chain
  All `puter.ai.chat` calls in page-bridge now route through
  `_puterChat`, which catches model-not-found errors and retries once
  with a fallback (`claude-sonnet-4-6` → `4-5`, `claude-opus-4-7` →
  `4-6`, `gemini-2.0-flash` → `1.5-flash`). When Anthropic deprecates
  Sonnet 4.6 (likely within months) the tutor falls back instead of
  500-erroring at the user.

MEDIUM — Stale-language verify writes
  Verify items now stamp `_langGeneration` at queue time. `_runVerifyQueue`
  filters stale batches, and `_verifySingle` re-checks after the
  Gemini await fence (which can be seconds long) before calling
  `_notifyUpdate`. Without this, a user switching language mid-page
  saw old-language text overwrite their new translation. content.js
  calls `translator.bumpLangGeneration()` from `switchLanguage`.

MEDIUM — `_cacheTranslation` actually awaits
  Was declared async but returned the moment `store.put()` was queued
  — callers' `await` was a no-op. Now resolves on `tx.oncomplete`.
  Caller timing assumptions (e.g. eviction-then-retry flows from the
  v3.5.6 fix) now hold.

Tests 309/309 pass; lint, format, selector-health, dicts, bg-sync,
glossary, translate-validate, firefox build, bundle build all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@heznpc heznpc merged commit 0ea4a9c into main Apr 30, 2026
3 checks passed
@heznpc heznpc deleted the fix/v3.5.7-translator-hardening branch April 30, 2026 22:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant