fix: vocabulary replacement fails for English words adjacent to CJK characters by sysalpha01 · Pull Request #93 · amicalhq/amical

sysalpha01 · 2026-02-09T21:49:34Z

Summary

Fix vocabulary replacement regex that fails when English words appear adjacent to CJK (Japanese/Chinese/Korean) characters
Change \p{L} to \p{Script=Latin} in word boundary lookahead/lookbehind
Latin word boundary protection still works (e.g., "apple" in "pineapple" is not replaced)

Problem

The word boundary regex in applyTextReplacements() uses \p{L} (all Unicode letters) in negative lookahead/lookbehind. Since \p{L} matches CJK characters, English words adjacent to Japanese text (e.g., Xavix in Xavixの設定) are never replaced because the CJK character triggers the boundary check.

Example

With vocabulary entry Xavix → ZABBIX:

Input: Xavixの設定
Before fix: Xavixの設定 (no replacement - の matches \p{L})
After fix: ZABBIXの設定 (correctly replaced - の doesn't match \p{Script=Latin})

Changes

apps/desktop/src/utils/text-replacement.ts:

Replace \p{L} with \p{Script=Latin} in the word boundary regex
This ensures only Latin script characters are considered for word boundaries
CJK characters adjacent to English words no longer block replacement

Test Plan

Verified Xavixの設定 → ZABBIXの設定 works correctly
Verified pineapple does not have apple replaced (Latin boundary still works)
Verified pure CJK replacements (e.g., ザビックス → ZABBIX) still work
Tested with Amical Cloud transcription on Windows

Summary by CodeRabbit

Bug Fixes
- Text replacement now properly handles mixed-script text, including CJK characters alongside Latin characters.

…haracters The word boundary regex used \p{L} (all Unicode letters) which includes CJK characters, preventing replacement of English words when they appear next to Japanese/Chinese/Korean text (e.g., "Xavixの設定" would not replace "Xavix" because "の" matched \p{L} in the lookahead). Changed \p{L} to \p{Script=Latin} so the word boundary check only considers Latin script characters. This allows vocabulary replacements to work correctly in CJK contexts while still preventing partial matches within Latin words (e.g., "apple" in "pineapple" is still protected). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-09T21:49:59Z

📝 Walkthrough

Walkthrough

The change modifies the word-boundary regex in the non-CJK branch of applyTextReplacements to use Latin-script boundaries instead of broader alphabetic/numeric boundaries. This allows CJK characters to be adjacent to matched text while preventing unintended matches within non-Latin words.

Changes

Cohort / File(s)	Summary
Text Replacement Logic `apps/desktop/src/utils/text-replacement.ts`	Modified word-boundary regex from `\p{L}\p{N}` to `[a-zA-Z0-9]` in non-CJK branch, narrowing boundary checks to Latin script and allowing CJK character adjacency. Updated comments to reflect Latin-script focus.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

amicalhq/amical#81: Introduced the original applyTextReplacements logic that this PR directly modifies.

Suggested reviewers

haritabh-z01

Poem

🐰 A boundary drawn with Latin grace,
No CJK chars out of place,
"Xavixの設定" now sings true,
Word edges sharp, regex anew! ✨

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main fix: it addresses a specific bug where vocabulary replacement fails when English words are adjacent to CJK characters, which aligns with the primary change in the regex modification.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment

No actionable comments were generated in the recent review. 🎉

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: vocabulary replacement fails for English words adjacent to CJK characters#93

fix: vocabulary replacement fails for English words adjacent to CJK characters#93
sysalpha01 wants to merge 1 commit intoamicalhq:mainfrom
sysalpha01:fix/cjk-vocabulary-replacement

sysalpha01 commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Feb 9, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

sysalpha01 commented Feb 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Example

Changes

Test Plan

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sysalpha01 commented Feb 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 9, 2026 •

edited

Loading