fix: use word-boundary regex for geo-tagging keyword matching#330
fix: use word-boundary regex for geo-tagging keyword matching#330princelevant wants to merge 1184 commits intokoala73:mainfrom
Conversation
t() always returns a string (key itself if missing), so || 'English' fallbacks were unreachable dead code.
t() always returns a string, so || 'English' fallbacks were unreachable. Removed all 15 instances.
Main variant: NHK World + Nikkei Asia in asia category. Finance variant: Nikkei Asia in markets category. Added asia.nikkei.com to RSS proxy allowlist.
Main variant: NHK World + Nikkei Asia in asia category. Finance variant: Nikkei Asia in markets category. Added asia.nikkei.com to RSS proxy allowlist.
…keys - CommunityWidget: add DOM check to prevent duplicate widgets on repeated loadNews() calls - RuntimeConfigPanel: compare t() result against key path to suppress missing help translations
…glish + Linux AppImage support (koala73#100) ## Summary - Full i18n system with 14 locales: en, fr, de, es, it, pl, pt, nl, sv, ru, ar (RTL), zh, ja — all at 1132-key parity - Eliminated ~110 hardcoded English strings across 50+ source files, replaced with `t()` calls - RTL support for Arabic with proper regional code normalization (ar-SA → ar) - Dead English fallback literals (`t() || 'English'`) removed from all components - Community discussion floating widget (localized) - Linux AppImage desktop build support - Proper noun heuristic fallback for trending keywords when ML unavailable ## Key changes - **New**: `src/services/i18n.ts` — i18next setup with language detection, RTL, locale switching - **New**: 13 locale JSON files (1132 keys each) in `src/locales/` - **New**: `src/styles/rtl-overrides.css` + `src/styles/lang-switcher.css` - **Modified**: 50+ components/services to use `t()` instead of hardcoded strings - **Modified**: `.github/workflows/build-desktop.yml` — Linux CI matrix - **Modified**: `scripts/desktop-package.mjs` + `download-node.sh` — Linux target support ## Test plan - [ ] Verify language switcher shows all 14 languages - [ ] Switch to Arabic — confirm `dir="rtl"` on `<html>`, layout mirrors - [ ] Switch to Japanese — confirm all panel labels, tooltips, popups render in Japanese - [ ] Switch to French — confirm no English leaks in panels, modals, map legend - [ ] Verify `{{count}}` interpolation works in timeAgo strings - [ ] Verify `tsc --noEmit` passes (confirmed locally) - [ ] Test community widget dismiss/localStorage persistence
PR koala73#97 only hid the badge itself but the SignalModal kept auto-opening on new signals. Gate all 5 automatic signalModal.show() calls behind findingsBadge.isEnabled() so disabling Intelligence Findings also suppresses the full-screen popup overlay. Closes koala73#89 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add lasillavacia.com RSS feed to improve Latin American political coverage. Independent Colombian investigative outlet covering governance, armed conflict, and regional power dynamics. Ref koala73#96 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
## Summary - Adds [La Silla Vacía](https://www.lasillavacia.com) RSS feed (`/rss`) to the `latam` feed category - Adds source tier entry (Tier 3 — specialty/investigative) - Colombian independent outlet covering political power structures, governance, and armed conflict Ref koala73#96 ## Test plan - [ ] Verify feed loads in LATAM news panel (content is in Spanish) - [ ] Confirm no duplicate or broken entries in feed list 🤖 Generated with [Claude Code](https://claude.com/claude-code)
## Summary - PR koala73#97 hid the badge but the `SignalModal` kept auto-opening on new signals — this is what the reporter was still seeing - Gates all 5 automatic `this.signalModal?.show()` calls behind `this.findingsBadge?.isEnabled()` so disabling Intelligence Findings also suppresses the full-screen popup overlay and sounds - Signal history is still recorded (`addToSignalHistory`) even when popup is suppressed, so re-enabling the toggle shows them Closes koala73#89 ## Test plan - [x] Disable Intelligence Findings via PANELS toggle or right-click - [x] Wait for signal refresh cycle — no full-screen popup should appear - [x] Re-enable → popups resume on next signal detection - [x] Build succeeds with no type errors 🤖 Generated with [Claude Code](https://claude.com/claude-code)
Initializes @sentry/browser early in main.ts with environment detection (production/preview/development). Disabled on localhost and Tauri desktop. Traces sampled at 10%.
Resolve instead of reject when the script fails to load (ad blocker, network issue). Guard initializePlayer against missing YT.Player. Prevents noisy unhandled rejection errors in Sentry.
…timeout, WebGL context loss, RSS 403s - storage.ts: add withTransaction() retry wrapper for IndexedDB InvalidStateError on iOS/Safari tab backgrounding - usa-spending.ts: add 20s AbortController timeout to prevent Safari "Load failed" on stalled POST - App.ts: add catch to runGuarded() to prevent unhandled rejections from task runner - main.ts: add Sentry ignoreErrors for WebGL context loss and ResizeObserver loop - DeckGLMap.ts: add webglcontextlost/restored handlers for graceful GPU recovery - feeds.ts: route rsshub.app feeds (NHK, MIIT, MOFCOM) through Railway proxy, switch Nikkei Asia and ECFR to Google News proxy - finance.ts: switch Nikkei Asia to Google News proxy, remove unused railwayRss helper
… extensions) - Add NotAllowedError, InvalidAccessError, importScripts to Sentry ignoreErrors - Add global unhandledrejection handler for YouTube IFrame API autoplay blocks - Add onError handler to deck.gl MapboxOverlay for internal render-cycle races
- withTransaction now returns undefined instead of throwing when InvalidStateError persists after retry (transient browser event) - Add .catch() to fire-and-forget cleanOldSnapshots() call
- Add beforeSend filter to drop minified 1-3 char library errors (e.g., "vd") - Filter transient network errors (Load failed, Failed to fetch, cancelled) - Filter browser extension errors (runtime.sendMessage, Java object is gone) - Filter non-Error promise rejections and SVG image load failures - Filter MapLibre imageManager null ref during WebGL context restore - Reset YouTube API promise on load failure to allow retry on next init - Move USASpending timeout cleanup to finally block - Log snapshot cleanup errors instead of silently swallowing
…variants Browser extensions intercept window.fetch causing "Failed to fetch (gamma-api.polymarket.com)" to leak as unhandled rejection. Remove the $ anchor so the pattern matches any suffix.
… noise filters Prevent getProjection null crash when WebGL context is lost by tracking webglLost flag and skipping all setProps/layer rebuild calls until restored. Add ignoreErrors for IndexedDB iOS kills, Twitter WebView injection, and CSP unsafe-eval from extensions.
…List guards - toggleFullscreen: use void .catch() for Promise-based requestFullscreen/ exitFullscreen + webkit prefix fallback for iOS Safari (WORLDMONITOR-11/13) - Narrow /^TypeError: Failed to fetch/ to exact match (was suppressing real API failures). Move module-import-failed to beforeSend with extension/ webview context check instead of blanket ignore (WORLDMONITOR-15) - Guard classList?.contains and target.closest?. on event targets that may not be Elements (WORLDMONITOR-Z/10) - Add noise filters: Fullscreen request denied, requestFullscreen, vc_text_indicators_context (WORLDMONITOR-12)
…er, IndexedDB write-drop - webkitRequestFullscreen returns void (not Promise) on Safari — use try/catch instead of .catch() to avoid undefined.catch() throw - Module-import beforeSend filter: only suppress when stack frames originate from browser extensions, not by URL domain check - withTransaction: throw on readwrite InvalidStateError after retry instead of silently returning undefined (prevents write-drop)
…ections - Wrap updateBaseline() in try/catch inside loadNewsCategory and intel path so IndexedDB write failures don't delete successfully fetched and rendered news data (P1) - Add .catch() to saveCurrentSnapshot() initial call and setInterval callback to prevent unhandled promise rejections from IndexedDB readwrite failures (P2)
… WebGL link errors - LiveNewsPanel: player.mute/unMute may not exist before onReady (WORLDMONITOR-16) - main.ts: add /Program failed to link/ noise filter (WORLDMONITOR-18)
…GLSL error signal
…tion probes (koala73#296) Sidecar validation probes were missing User-Agent headers, causing Cloudflare-fronted APIs (e.g. Wingbits) to return 403 which was incorrectly treated as an auth rejection. Added CHROME_UA to all 13 probes and isCloudflare403() helper to soft-pass CDN blocks.
…ries (koala73#299) Models like DeepSeek-R1 and QwQ output chain-of-thought as plain text even with think:false. This caused summaries like "We need to summarize the top story..." instead of actual news content. - Remove message.reasoning fallback that used thinking tokens as summary - Extend tag stripping to <|thinking|>, <reasoning>, <reflection> formats - Add hasReasoningPreamble() to reject task narration and prompt echoes - Gate reasoning detection to brief/analysis modes (translate unaffected) - Bump CACHE_VERSION v3→v4 to invalidate polluted cached summaries - Add 28 unit tests covering all edge cases
…oala73#285) * fix: sync YouTube live panel mute state with native player controls * fix: harden YouTube embed mute sync (postMessage origin, interval cleanup, DRY destroy) --------- Co-authored-by: Elie Habib <elie.habib@gmail.com>
* test: add Playwright e2e tests for flushStaleRefreshes 4 tests covering: stale services flushed on tab focus (hidden > interval), no-op when hiddenSince is 0, skips non-stale services (hidden < interval), and 150ms stagger between re-triggered services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: convert flushStaleRefreshes to fast unit test, fix timeout leaks and timing flakiness - Move from Playwright e2e to Node.js unit test (tests/ dir) - Add source contract tests to detect if App.ts method signature drifts - Clean up all timeouts in afterEach to prevent leaks - Assert ordering + minimum gaps instead of absolute time windows (CI-safe) - Add assertions for refreshTimeoutIds state after flush - Add test for non-stale service timeout preservation * test: make flush stale refresh tests deterministic --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Elie Habib <elie.habib@gmail.com>
…koala73#302) * fix: harden desktop embed messaging and secret validation * fix: harden embed postMessage origin check and add custom channel validation Security: - Block wildcard parentOrigin from query params (server-side sanitizer) - Validate e.origin on incoming postMessage commands in embed - Remove misleading asset: protocol from allowed list - Require 2+ markers for Cloudflare challenge detection (drop overly broad 'cloudflare' marker) - Add ordering comment on isAuthFailure vs isCloudflareChallenge403 - Strengthen embed test assertions with regex + wildcard rejection test Channel validation: - Validate YouTube handle format (@<3-30 chars>) before adding - Verify channel exists on YouTube via /api/youtube/live before adding - Show "Verifying…" loading state, red border on invalid, offline tolerance - Return channelExists flag from /api/youtube/live endpoint
* Simplify RSS freshness update to static import * Refine vendor chunking for map stack in Vite build * Patch transitive XML parser vulnerability via npm override * Shim Node child_process for browser bundle warnings * Filter known onnxruntime eval warning in Vite build * test: add loaders XML/WMS parser regression coverage * chore: align fast-xml-parser override with merged dependency set --------- Co-authored-by: Elie Habib <elie.habib@gmail.com>
…73#306) - Add levels, trends, fallback keys to top-level countryBrief in en/el/th/vi locales (fixes raw key display in intelligence brief and header badge) - Add Export PDF option to country brief dropdown using scoped print dialog - Add exportPdf i18n key to all 17 locale files
…73#308) - Add levels, trends, fallback keys to top-level countryBrief in en/el/th/vi locales (fixes raw key display in intelligence brief and header badge) - Add Export PDF option to country brief dropdown using scoped print dialog - Add exportPdf i18n key to all 17 locale files
…lity (koala73#313) WKWebView (Tauri macOS) doesn't support HTML5 Drag and Drop API. Replace draggable/dragstart/dragover with mousedown/mousemove/mouseup across panel grid reorder, live channel tabs, and channel settings. Uses elementFromPoint with same-row detection for accurate horizontal and vertical drag positioning.
…ala73#315) - Add panelDragCleanupHandlers to remove document listeners on destroy - Suppress channel click/edit after drag-end to prevent accidental actions
…oala73#316) Adds ignoreErrors patterns for Worker constructor, Facebook in-app browser, UC Browser, duplicate custom elements, WebGPU device limits, and stale container. Extends beforeSend to suppress TypeErrors from deck-stack chunk (same pattern as maplibre map chunk).
) * feat: add AI analysis settings popup to Insights panel (web-only) Add a gear icon to the AI Insights panel header that opens a settings popup giving web users explicit control over the AI analysis pipeline. Users can now toggle cloud AI (Groq/OpenRouter) and browser local model independently, with a static CTA for Ollama desktop support. - New ai-flow-settings.ts state layer with localStorage persistence - SummarizeOptions param added to generateSummary() (backward-compatible) - InsightsPanel: gear icon, disabled state, generation token for races - AiFlowPopup: toggles, 250MB warning, status footer, Ollama CTA - Remove mlWorker.isAvailable gate in App.ts for cloud-only mode - CSS: popup, toggles, status indicators, disabled state - i18n: 16 new keys across all 17 locale files with translations https://claude.ai/code/session_01AgLDUybKNri83vgZQNC3HF * fix: reset brief cache on settings change, remove dead code in popup - Reset cachedBrief and lastBriefUpdate in onAiFlowChanged() so new provider settings take effect immediately instead of being blocked by the 2-minute cooldown with a stale (possibly null) cached brief - Remove unused isAnyAiProviderEnabled() import and dead `void any` in AiFlowPopup.updateStatus() https://claude.ai/code/session_01AgLDUybKNri83vgZQNC3HF * fix: invalidate insights brief cache on AI flow changes --------- Co-authored-by: Claude <noreply@anthropic.com>
…ala73#317) Adds islandtimes.org/feed/ to the asia region feeds and allowlists the domain in the RSS proxy.
…re source regions (koala73#319) Replace 4 scattered settings UIs (gear popup, panels modal, sources modal, language dropdown) with a single 3-tab modal (General/Panels/Sources). Sources tab features region pills that dynamically adapt per variant: - Full: Worldwide, US, Europe, Middle East, Africa, Latin America, Asia-Pacific, Topical, Intelligence - Tech: Tech News, AI & ML, Startups & VC, Regional Ecosystems, Developer, Cybersecurity, Policy & Research, Media & Podcasts - Finance: Markets & Analysis, Fixed Income & FX, Commodities, Crypto & Digital, Central Banks & Economy, Deals & Corporate, Financial Regulation, Gulf & MENA Also reclassifies full-variant feeds: splits monolithic politics into politics (worldwide), us, and europe; redistributes misplaced sources. Additional fixes: - Variant switcher works on localhost via localStorage (no multiple dev servers) - mapNewsFlash toggle no longer triggers expensive AI re-analysis - Remove dead intel-findings toggle from desktop settings window - LiveNewsPanel uses shared SITE_VARIANT (respects localStorage override)
…3#324) Keyword matching across the geo-tagging pipeline used String.includes() (substring matching), causing false positives like "assad" matching inside "ambassador" and tagging unrelated articles to Syria. Replaced all instances with word-boundary regex (\b...\b) for accurate matching. Also replaced the ambiguous 3-char "hts" keyword (matched "rights", "fights", etc.) with unambiguous "tahrir al-sham" / "hayat tahrir". Fixes koala73#324 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@princelevant is attempting to deploy a commit to the Elie Team on Vercel. A member of the Team first needs to authorize it. |
|
Lovely |
Plan vs Implementation ReviewThanks for tackling #324! The core goal (fixing substring false positives) is right, but the implementation diverges from the approved plan in ways that introduce new issues. Here's a detailed comparison. Approach MismatchThe approved plan uses tokenization-based exact word matching ( Issues
What the PR Gets Right
Recommended ChangesPer the approved plan (
The plan file has the full |
…73#324) Replace word-boundary regex with tokenization + Set lookups per approved plan: - Create src/utils/keyword-match.ts as single source of truth - Tokenize titles once, O(1) Set.has() per keyword (no RegExp allocations) - Restore 'hts' keyword for Damascus (safe with tokenization) - Revert shared includesKeyword() in analysis-constants.ts - Remove 'us ' trailing-space hack and bare 'house' from DC keywords - Add tech-hub-index.ts to scope (was missing) - Add integration tests for inferGeoHubsFromTitle flow Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Hey @koala73 — this is a great initiative and I'm happy to contribute early on. The impact is huge. Thank you for the quick and prompt responses! Here's the fix based on your feedback: Changes in this revision:
Let me know if anything else needs adjusting. Yalla! 🚀 — KT |
|
Hey @princelevant — great improvement switching to tokenization! The architecture is now aligned with the approved plan: One critical issue remaining before we can merge: 🔴 CRITICAL: Possessive forms produce false negativesThe tokenizer splits on The approved plan specified compound + sub-part decomposition to handle this. After adding each cleaned token, split on export function tokenizeForMatch(title: string): TokenizedTitle {
const lower = title.toLowerCase();
const words = new Set<string>();
const ordered: string[] = [];
for (const raw of lower.split(/\s+/)) {
const cleaned = raw.replace(/^[^a-z0-9]+|[^a-z0-9]+$/g, '');
if (!cleaned) continue;
words.add(cleaned); // "assad's" as compound
ordered.push(cleaned);
for (const part of cleaned.split(/[^a-z0-9]+/)) {
if (part) words.add(part); // "assad", "s" as sub-parts
}
}
return { words, ordered };
}This gives Please also add test cases for possessives — that's how this slipped through: it('"assad" matches "Assad\'s forces advance"', () => {
assert.equal(matchesAnyKeyword("Assad's forces advance in Idlib", ['assad']), true);
});🟡 Minor items
Everything else looks solid — the keyword data fixes in |
Summary
String.includes()with word-boundary regex (\b...\b) across the entire geo-tagging pipeline to prevent substring false positives"hts"keyword (matched "rights", "fights", etc.) with"tahrir al-sham"/"hayat tahrir"Problem
When zooming into Syria on the map, unrelated articles (e.g. French politics mentioning "ambassador") appeared at Syria's coordinates. The keyword
"assad"matched as a substring inside"ambassador", and"hts"matched inside"rights","fights","flights", etc.Root cause: keywords >= 5 characters used
titleLower.includes(keyword)instead of word-boundary regex.Files changed
src/services/geo-hub-index.tssrc/components/DeckGLMap.ts\bregexsrc/components/Map.tssrc/App.ts\bregexsrc/services/entity-index.ts\bregexsrc/services/country-instability.ts\bregexsrc/services/story-data.ts\bregexsrc/services/related-assets.ts\bregexsrc/utils/analysis-constants.tsincludesKeyword()utility uses\bregexsrc/config/geo.ts"hts"with"tahrir al-sham"/"hayat tahrir"tests/geo-keyword-matching.test.mjsTest plan
vite buildpasses cleanFixes #324
-KT
🤖 Generated with Claude Code