fix: v3.5.7 translator + bridge hardening (6 issues from deep audit) by heznpc · Pull Request #94 · heznpc/skillBridge

heznpc · 2026-04-30T22:43:05Z

Summary

Second-pass deep audit on the bridge layer (translator.js 778 lines + page-bridge.js 251 lines + puter.js) found six real issues. The previous content.js-focused audit (#92) didn't cover this surface. Three are correctness bugs at current scale; the rest add resilience as external deps shift.

#	Severity	Issue	Fix
1	🔴 CRITICAL	Verify-queue tail items leak (translator.js:408)	Extract `_kickVerifyQueue` + self-restart from `.finally`
2	🟠 HIGH	IndexedDB cache poisoning — no shape validation	`_isValidTranslation` rejects HTML / length>10× / >95% ASCII for non-Latin
3	🟠 HIGH	Bridge-inject single-shot kills tutor	`_injectPageBridgeWithRetry` — 2 retries, exp backoff
4	🟠 HIGH	Hardcoded `claude-sonnet-4-6` no fallback	`_puterChat` wraps all calls with deprecation chain
5	🟡 MEDIUM	Lang-switch writes stale-lang text into new page	`_langGeneration` stamp + re-check after Gemini await
6	🟡 MEDIUM	`_cacheTranslation` returned before `store.put` committed	Resolve on `tx.oncomplete`

Why now

The user's concern was: AI/Skilljar/Anthropic are all moving fast and we have no production telemetry. Issues #2, #3, #4 are specifically resilience plays:

fix: i18n welcome banner + update GOOD_FIRST_ISSUES #2: any GT/Puter regression that returns garbage doesn't poison the cache for 30 days
feat: new icons + sidebar UI overhaul + CSP fix #3: Puter.js CDN jitter no longer kills the tutor for the session
chore: add Chrome Web Store listing assets #4: when Anthropic deprecates Sonnet 4.6 (likely in coming months), the tutor falls back to 4-5 instead of erroring at users

Verification (local)

Tests: 309/309 pass
Lint / format / selector health / dicts / bg-sync / glossary / validate — all green
Firefox build / bundle build — both pass; bundle 113.9 KB (no measurable size change)

Test plan

CI: validate + build + test green
Manual on a Skilljar lesson: open a long lesson with Korean active, switch to Japanese mid-translation, confirm no Korean text leaks into Japanese page
Manual: throttle network in DevTools (offline once during init), confirm bridge retries successfully and tutor works without reload
Manual: trigger a verify storm (translate a 200+ string lesson), confirm progress bar clears, all elements get verified, no spinners stuck

Known follow-ups (NOT in this PR)

Nonce-exposure window during bridge script-tag injection (HIGH security/defense-in-depth, theoretical attack, needs careful protocol redesign)
YouTube subtitle handler retry-timer leak on rapid lang toggle (separate file, separate review)
IDB schema-version migration story (no fields needed yet; future-work)

🤖 Generated with Claude Code

A second-pass audit on translator.js (778 lines) and the page-bridge protocol surfaced six real issues. None of these would have been caught by the previous content.js-focused audit. Three are correctness bugs visible to users at the current scale; the rest add resilience as external dependencies (Puter.js, model names, Anthropic deprecation windows) shift under us. CRITICAL — Verify-queue tail-item race Items pushed between `_runVerifyQueue`'s while-loop exit and the `.finally()` clearing `_verifyLock` got queued but no new run was scheduled — on a quiet page they sat un-verified forever. Extracted `_kickVerifyQueue()` and made `.finally()` self-restart if items arrived during teardown. Also unified the two duplicate lock-create sites (queueGeminiVerify and BRIDGE_READY handler) onto the helper. HIGH — IndexedDB cache poisoning `_cacheTranslation` wrote whatever GT/Gemini returned to disk and served it for 30 days. A single corrupted response or transient proxy error page poisoned the cache. Added `_isValidTranslation` that rejects HTML tags, length ratios over 10×, and >95% ASCII for non-Latin target languages (typical refusal/error string). Skipped payloads silently retry on the next page load. HIGH — Bridge-injection retry `script.onerror` and bridge timeout used to kill AI features for the whole tab session — one CDN hiccup, one CSP transient, dead tutor until reload. New `_injectPageBridgeWithRetry` does up to 2 retries with exponential backoff (500/1000/2000 ms). The `skillbridge:bridgeunavailable` banner now only fires after the retry budget is exhausted, not on the first failure. HIGH — Model-name fallback chain All `puter.ai.chat` calls in page-bridge now route through `_puterChat`, which catches model-not-found errors and retries once with a fallback (`claude-sonnet-4-6` → `4-5`, `claude-opus-4-7` → `4-6`, `gemini-2.0-flash` → `1.5-flash`). When Anthropic deprecates Sonnet 4.6 (likely within months) the tutor falls back instead of 500-erroring at the user. MEDIUM — Stale-language verify writes Verify items now stamp `_langGeneration` at queue time. `_runVerifyQueue` filters stale batches, and `_verifySingle` re-checks after the Gemini await fence (which can be seconds long) before calling `_notifyUpdate`. Without this, a user switching language mid-page saw old-language text overwrite their new translation. content.js calls `translator.bumpLangGeneration()` from `switchLanguage`. MEDIUM — `_cacheTranslation` actually awaits Was declared async but returned the moment `store.put()` was queued — callers' `await` was a no-op. Now resolves on `tx.oncomplete`. Caller timing assumptions (e.g. eviction-then-retry flows from the v3.5.6 fix) now hold. Tests 309/309 pass; lint, format, selector-health, dicts, bg-sync, glossary, translate-validate, firefox build, bundle build all green. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

heznpc merged commit 0ea4a9c into main Apr 30, 2026
3 checks passed

heznpc deleted the fix/v3.5.7-translator-hardening branch April 30, 2026 22:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: v3.5.7 translator + bridge hardening (6 issues from deep audit)#94

fix: v3.5.7 translator + bridge hardening (6 issues from deep audit)#94
heznpc merged 1 commit intomainfrom
fix/v3.5.7-translator-hardening

heznpc commented Apr 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

heznpc commented Apr 30, 2026

Summary

Why now

Verification (local)

Test plan

Known follow-ups (NOT in this PR)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant