Skip to content

test(chinese): Attempt at adding tests for the chinese regex match#73

Open
markscamilleri wants to merge 1 commit intosawhney17:mainfrom
markscamilleri:add-test-for-chinese
Open

test(chinese): Attempt at adding tests for the chinese regex match#73
markscamilleri wants to merge 1 commit intosawhney17:mainfrom
markscamilleri:add-test-for-chinese

Conversation

@markscamilleri
Copy link
Copy Markdown
Contributor

@markscamilleri markscamilleri commented Jan 21, 2024

Problem

I noticed while testing #72 that this piece of code:

if (page.match(/^[\u4e00-\u9fa5]{0,}$/gm)) {
  content = content.replaceAll(
    chineseRegex,
    parseAsTags ? `#${page}` : `[[${page}]]`
  );
  needsUpdate = true;
}

was not actually tested anywhere.

Solution

This is my attempt at adding a couple of tests for this secton of code.

Apologies and Disclaimer

However, I am not a native Chinese, Japanese, Korean or Vietnamese speaker, and this was a best guess based on the official unicode table and the Cabridge English <-> Chinese (Simplified) dictionary, so if this is not right for any reason, please feel free to edit the PR and/or feedback here please! The aim here is to avoid any regression from getting introduced in the future.

Question

I also noticed that the unicode tables go all the way to \u9fff for CJKV characters/ideographs. Should we expand the scope of the chineseRegex to match this?

@the-homeless-god
Copy link
Copy Markdown

the-homeless-god commented Feb 23, 2025

Hi @markscamilleri

I've created fork with supporting of languages (Russian, Chinese, Korean, Japanese, German)

https://github.com/the-homeless-god/logseq-automatic-linker-international

So you can just take a look into it

Looks like the original repo not has updates since long time

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants