feature(search): support searchNormalize for Greek and Cyrillic characters#466
feature(search): support searchNormalize for Greek and Cyrillic characters#466
Conversation
Replace the previous NON_WORD_REGEX with a COMBINING_MARKS_REGEX (\u0300-\u036f) so normalizeString only removes Unicode combining diacritical marks after NFD normalization. This preserves valid characters (letters, digits, punctuation) instead of stripping all non-word characters.
Replace the normalization regex to strip Unicode combining marks (\u0300-\u036f) so searchNormalize correctly handles Greek and Cyrillic diacritics (e.g. Ένα, ё, й). Update the minified build accordingly. Add example initializations for Greek and Cyrillic selects in docs/assets/script.js and add Cypress E2E tests (cypress/e2e/examples.cy.ts) that verify search behavior with searchNormalize true/false and a regression check for Latin diacritics. Also update docs/examples.md to reflect the new examples.
Regenerate distribution artifacts: update dist/virtual-select.js, dist/virtual-select.min.js, and dist-archive/virtual-select-1.1.5.min.js. This updates the built/minified output to include recent changes from the source (no source code logic changes in this commit).
Bump multiple dev dependencies (Babel toolchain, babel-loader, css-loader, autoprefixer, cypress, cypress-real-events, sass, sass-loader, stylelint, webpack, webpack-cli, filemanager-webpack-plugin, postcss-loader, ts-api-utils/TypeScript, etc.). package-lock.json regenerated to lock the updated versions.
There was a problem hiding this comment.
Pull request overview
This PR updates the search normalization logic to support Greek and Cyrillic text when searchNormalize: true, and adds docs/examples + Cypress coverage to validate the behavior.
Changes:
- Update
normalizeString()to strip Unicode combining marks after NFD normalization (instead of stripping non-ASCII “non-word” chars). - Add documentation examples for Greek/Cyrillic normalization and wire them into the docs demo script.
- Add Cypress E2E coverage for Greek/Cyrillic normalization and Latin-diacritics regression checks.
Reviewed changes
Copilot reviewed 8 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| src/utils/utils.js | Updates the normalization regex used by searchNormalize. |
| package.json | Updates devDependencies (build/test tooling) and adds ts-api-utils. |
| docs/examples.md | Adds a new Greek/Cyrillic “searchNormalize” example section. |
| docs/assets/virtual-select.js | Updates built docs asset to reflect new normalization logic. |
| docs/assets/script.js | Initializes new Greek/Cyrillic example selects in the docs demo page. |
| dist/virtual-select.js | Updates distributed (unminified) build with new normalization logic. |
| dist/virtual-select.min.js | Updates distributed minified build with new normalization logic. |
| cypress/e2e/examples.cy.ts | Adds E2E tests for Greek/Cyrillic searchNormalize and Latin regression. |
| .github/PULL_REQUEST_TEMPLATE.md | Adds a PR template for future contributions. |
| .claude/settings.local.json | Adds Claude tooling permissions config. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Add two examples to docs/examples.md demonstrating VirtualSelect configured with searchNormalize: false for Greek and Cyrillic option sets. These examples show search enabled with option descriptions while preserving original character forms, complementing the existing normalized-search examples.
Replace the explicit range /[\u0300-\u036f]/g with the Unicode property escape /\p{M}/gu to strip combining marks. This broadens matching to all Unicode combining marks (not just U+0300–U+036F) while preserving NFD normalization. Note: requires RegExp Unicode property escape support (ES2018+).
Regenerate built/minified bundles for Virtual Select. Updated dist/virtual-select.js, dist/virtual-select.min.js, dist-archive/virtual-select-1.1.5.min.js and the corresponding docs/assets copies so the committed distribution and documentation assets are in sync with the latest build.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| const NON_WORD_REGEX = /[^\w]/g; | ||
| return text.normalize('NFD').replace(NON_WORD_REGEX, ''); | ||
| const COMBINING_MARKS_REGEX = /\p{M}/gu; | ||
| return text.normalize('NFD').replace(COMBINING_MARKS_REGEX, ''); |
There was a problem hiding this comment.
This changes normalizeString() semantics from “strip all non-\\w chars” (including punctuation/whitespace) to “strip only combining marks”. That can regress search matching for labels/descriptions containing punctuation or separators (e.g., co-op no longer normalizes to coop, so searching coop may stop matching when searchNormalize: true). If the intent is to keep prior behavior while fixing non-Latin scripts, consider removing combining marks and filtering out non-letter/number characters in a Unicode-aware way (e.g., keep \\p{L}\\p{N}_ across scripts), or explicitly document that punctuation is now preserved and update search expectations accordingly.
| return text.normalize('NFD').replace(COMBINING_MARKS_REGEX, ''); | |
| const NON_WORD_CHARS_REGEX = /[^\p{L}\p{N}_]/gu; | |
| return text | |
| .normalize('NFD') | |
| .replace(COMBINING_MARKS_REGEX, '') | |
| .replace(NON_WORD_CHARS_REGEX, ''); |
| it('finds Greek option when searching with accent (exact match)', () => { | ||
| cy.open(id).search('Ένα').checkFirstOption('Ένα'); | ||
| }); | ||
|
|
||
| it('finds Greek option when searching without accent (normalized)', () => { | ||
| cy.getVs(id).search('Ενα').checkFirstOption('Ένα'); | ||
| }); |
There was a problem hiding this comment.
Several tests appear to rely on inter-test state (e.g., the first test opens the dropdown but doesn’t close it, and subsequent tests call cy.getVs(id).search(...) without clearly re-opening). This can make the suite order-dependent and flaky. Prefer isolating each it by opening/closing within the test, or use beforeEach to cy.open(id) and afterEach to close/reset, so each case runs from a known UI state.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Move COMBINING_MARKS_REGEX out of Utils.normalizeString and declare it at the top of src/utils/utils.js so the regex isn't recreated on each call. No functional change; normalizeString now uses the shared, precompiled constant for better clarity and minor performance improvement.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 7 out of 12 changed files in this pull request and generated 5 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| describe('Greek search with searchNormalize: false', () => { | ||
| const id = 'greek-search-no-normalize-select'; | ||
|
|
There was a problem hiding this comment.
These new describe blocks are missing the usual go to section setup, making them order-dependent (they only work if a previous test already navigated to the correct docs section). To keep the suite robust and consistent with the rest of the file, add a cy.goToSection(...) in a beforeEach/first it for this block (and the other new blocks below).
| beforeEach(() => { | |
| cy.goToSection('Greek and Cyrillic search normalize'); | |
| }); |
| describe('Cyrillic search with searchNormalize: true', () => { | ||
| const id = 'cyrillic-search-select'; | ||
|
|
||
| it('finds Cyrillic option with ё (exact match)', () => { | ||
| cy.open(id).search('Ёжик').checkFirstOption('Ёжик'); | ||
| }); | ||
|
|
There was a problem hiding this comment.
This describe block also lacks a cy.goToSection(...) setup step, so it implicitly relies on prior tests having navigated to the section and rendered the selects. Add an explicit navigation step (or a shared beforeEach) so the block can run independently.
| describe('Cyrillic search with searchNormalize: false', () => { | ||
| const id = 'cyrillic-search-no-normalize-select'; | ||
|
|
||
| it('finds Cyrillic option with ё (exact lowercase match)', () => { | ||
| cy.open(id).search('Ёжик').checkFirstOption('Ёжик'); | ||
| }); |
There was a problem hiding this comment.
This describe block also doesn't navigate to the docs section before interacting with the select. Add cy.goToSection('Greek and Cyrillic search normalize') (or a shared beforeEach) to avoid test-order coupling.
| "eslint-plugin-import": "^2.32.0", | ||
| "eslint-plugin-sonarjs": "^3.0.4", | ||
| "filemanager-webpack-plugin": "^9.0.1", | ||
| "filemanager-webpack-plugin": "^10.0.1", |
There was a problem hiding this comment.
filemanager-webpack-plugin@^10.0.1 declares engines.node >=22.14.0 (per lockfile) while the release workflow uses Node 20. This is likely to break npm ci/npm run build during publishing. Either keep filemanager-webpack-plugin on a Node-20-compatible major (e.g. v9) or bump the workflow Node version (and document the new minimum Node requirement).
| "filemanager-webpack-plugin": "^10.0.1", | |
| "filemanager-webpack-plugin": "^9.0.0", |
| key: "normalizeString", | ||
| value: function normalizeString(text) { | ||
| var NON_WORD_REGEX = /[^\w]/g; | ||
| return text.normalize('NFD').replace(NON_WORD_REGEX, ''); | ||
| var COMBINING_MARKS_REGEX = /(?:[\u0300-\u036F\u0483-\u0489\u0591-\u05BD\u05BF\u05C1\u05C2\u05C4\u05C5\u05C7\u0610-\u061A\u064B-\u065F\u0670\u06D6-\u06DC\u06DF-\u06E4\u06E7\u06E8\u06EA-\u06ED\u0711\u0730-\u074A\u07A6-\u07B0\u07EB-\u07F3\u07FD\u0816-\u0819\u081B-\u0823\u0825-\u0827\u0829-\u082D\u0859-\u085B\u0897-\u089F\u08CA-\u08E1\u08E3-\u0903\u093A-\u093C\u093E-\u094F\u0951-\u0957\u0962\u0963\u0981-\u0983\u09BC\u09BE-\u09C4\u09C7\u09C8\u09CB-\u09CD\u09D7\u09E2\u09E3\u09FE\u0A01-\u0A03\u0A3C\u0A3E-\u0A42\u0A47\u0A48\u0A4B-\u0A4D\u0A51\u0A70\u0A71\u0A75\u0A81-\u0A83\u0ABC\u0ABE-\u0AC5\u0AC7-\u0AC9\u0ACB-\u0ACD\u0AE2\u0AE3\u0AFA-\u0AFF\u0B01-\u0B03\u0B3C\u0B3E-\u0B44\u0B47\u0B48\u0B4B-\u0B4D\u0B55-\u0B57\u0B62\u0B63\u0B82\u0BBE-\u0BC2\u0BC6-\u0BC8\u0BCA-\u0BCD\u0BD7\u0C00-\u0C04\u0C3C\u0C3E-\u0C44\u0C46-\u0C48\u0C4A-\u0C4D\u0C55\u0C56\u0C62\u0C63\u0C81-\u0C83\u0CBC\u0CBE-\u0CC4\u0CC6-\u0CC8\u0CCA-\u0CCD\u0CD5\u0CD6\u0CE2\u0CE3\u0CF3\u0D00-\u0D03\u0D3B\u0D3C\u0D3E-\u0D44\u0D46-\u0D48\u0D4A-\u0D4D\u0D57\u0D62\u0D63\u0D81-\u0D83\u0DCA\u0DCF-\u0DD4\u0DD6\u0DD8-\u0DDF\u0DF2\u0DF3\u0E31\u0E34-\u0E3A\u0E47-\u0E4E\u0EB1\u0EB4-\u0EBC\u0EC8-\u0ECE\u0F18\u0F19\u0F35\u0F37\u0F39\u0F3E\u0F3F\u0F71-\u0F84\u0F86\u0F87\u0F8D-\u0F97\u0F99-\u0FBC\u0FC6\u102B-\u103E\u1056-\u1059\u105E-\u1060\u1062-\u1064\u1067-\u106D\u1071-\u1074\u1082-\u108D\u108F\u109A-\u109D\u135D-\u135F\u1712-\u1715\u1732-\u1734\u1752\u1753\u1772\u1773\u17B4-\u17D3\u17DD\u180B-\u180D\u180F\u1885\u1886\u18A9\u1920-\u192B\u1930-\u193B\u1A17-\u1A1B\u1A55-\u1A5E\u1A60-\u1A7C\u1A7F\u1AB0-\u1ADD\u1AE0-\u1AEB\u1B00-\u1B04\u1B34-\u1B44\u1B6B-\u1B73\u1B80-\u1B82\u1BA1-\u1BAD\u1BE6-\u1BF3\u1C24-\u1C37\u1CD0-\u1CD2\u1CD4-\u1CE8\u1CED\u1CF4\u1CF7-\u1CF9\u1DC0-\u1DFF\u20D0-\u20F0\u2CEF-\u2CF1\u2D7F\u2DE0-\u2DFF\u302A-\u302F\u3099\u309A\uA66F-\uA672\uA674-\uA67D\uA69E\uA69F\uA6F0\uA6F1\uA802\uA806\uA80B\uA823-\uA827\uA82C\uA880\uA881\uA8B4-\uA8C5\uA8E0-\uA8F1\uA8FF\uA926-\uA92D\uA947-\uA953\uA980-\uA983\uA9B3-\uA9C0\uA9E5\uAA29-\uAA36\uAA43\uAA4C\uAA4D\uAA7B-\uAA7D\uAAB0\uAAB2-\uAAB4\uAAB7\uAAB8\uAABE\uAABF\uAAC1\uAAEB-\uAAEF\uAAF5\uAAF6\uABE3-\uABEA\uABEC\uABED\uFB1E\uFE00-\uFE0F\uFE20-\uFE2F]|\uD800[\uDDFD\uDEE0\uDF76-\uDF7A]|\uD802[\uDE01-\uDE03\uDE05\uDE06\uDE0C-\uDE0F\uDE38-\uDE3A\uDE3F\uDEE5\uDEE6]|\uD803[\uDD24-\uDD27\uDD69-\uDD6D\uDEAB\uDEAC\uDEFA-\uDEFF\uDF46-\uDF50\uDF82-\uDF85]|\uD804[\uDC00-\uDC02\uDC38-\uDC46\uDC70\uDC73\uDC74\uDC7F-\uDC82\uDCB0-\uDCBA\uDCC2\uDD00-\uDD02\uDD27-\uDD34\uDD45\uDD46\uDD73\uDD80-\uDD82\uDDB3-\uDDC0\uDDC9-\uDDCC\uDDCE\uDDCF\uDE2C-\uDE37\uDE3E\uDE41\uDEDF-\uDEEA\uDF00-\uDF03\uDF3B\uDF3C\uDF3E-\uDF44\uDF47\uDF48\uDF4B-\uDF4D\uDF57\uDF62\uDF63\uDF66-\uDF6C\uDF70-\uDF74\uDFB8-\uDFC0\uDFC2\uDFC5\uDFC7-\uDFCA\uDFCC-\uDFD0\uDFD2\uDFE1\uDFE2]|\uD805[\uDC35-\uDC46\uDC5E\uDCB0-\uDCC3\uDDAF-\uDDB5\uDDB8-\uDDC0\uDDDC\uDDDD\uDE30-\uDE40\uDEAB-\uDEB7\uDF1D-\uDF2B]|\uD806[\uDC2C-\uDC3A\uDD30-\uDD35\uDD37\uDD38\uDD3B-\uDD3E\uDD40\uDD42\uDD43\uDDD1-\uDDD7\uDDDA-\uDDE0\uDDE4\uDE01-\uDE0A\uDE33-\uDE39\uDE3B-\uDE3E\uDE47\uDE51-\uDE5B\uDE8A-\uDE99\uDF60-\uDF67]|\uD807[\uDC2F-\uDC36\uDC38-\uDC3F\uDC92-\uDCA7\uDCA9-\uDCB6\uDD31-\uDD36\uDD3A\uDD3C\uDD3D\uDD3F-\uDD45\uDD47\uDD8A-\uDD8E\uDD90\uDD91\uDD93-\uDD97\uDEF3-\uDEF6\uDF00\uDF01\uDF03\uDF34-\uDF3A\uDF3E-\uDF42\uDF5A]|\uD80D[\uDC40\uDC47-\uDC55]|\uD818[\uDD1E-\uDD2F]|\uD81A[\uDEF0-\uDEF4\uDF30-\uDF36]|\uD81B[\uDF4F\uDF51-\uDF87\uDF8F-\uDF92\uDFE4\uDFF0\uDFF1]|\uD82F[\uDC9D\uDC9E]|\uD833[\uDF00-\uDF2D\uDF30-\uDF46]|\uD834[\uDD65-\uDD69\uDD6D-\uDD72\uDD7B-\uDD82\uDD85-\uDD8B\uDDAA-\uDDAD\uDE42-\uDE44]|\uD836[\uDE00-\uDE36\uDE3B-\uDE6C\uDE75\uDE84\uDE9B-\uDE9F\uDEA1-\uDEAF]|\uD838[\uDC00-\uDC06\uDC08-\uDC18\uDC1B-\uDC21\uDC23\uDC24\uDC26-\uDC2A\uDC8F\uDD30-\uDD36\uDEAE\uDEEC-\uDEEF]|\uD839[\uDCEC-\uDCEF\uDDEE\uDDEF\uDEE3\uDEE6\uDEEE\uDEEF\uDEF5]|\uD83A[\uDCD0-\uDCD6\uDD44-\uDD4A]|\uDB40[\uDD00-\uDDEF])/g; | ||
| return text.normalize('NFD').replace(COMBINING_MARKS_REGEX, ''); |
There was a problem hiding this comment.
This function defines COMBINING_MARKS_REGEX inside normalizeString(), which will re-create the large RegExp on every call. Since docs search also runs per keystroke, consider hoisting the regex to module scope in this built asset to avoid unnecessary recompilation.
Replace focused Greek/Cyrillic test suite with a broader multi-language search normalize suite in cypress/e2e/examples.cy.ts (renamed IDs/sections and added many language cases and negative tests). Rebuild/minify output and documentation assets were updated accordingly: dist/, dist-archive/ and docs/assets/* and docs/examples.md reflect the changes. These updates expand coverage for search normalization behavior and sync compiled artifacts and docs with the new test/content changes.
Add multi-language search demos and end-to-end tests to cover search normalization behavior across different scripts. Changes include: - cypress/e2e/examples.cy.ts: Add Cypress tests for multi-language variants (tags and popup) with searchNormalize true/false, validating matches for diacritics, Cyrillic, and CJK inputs and tag behavior. - docs/assets/script.js: Initialize new VirtualSelect instances for the added demo elements (#multi-language-tags-search-select, #multi-language-tags-search-no-normalize-select, #multi-language-popup-search-select, #multi-language-popup-search-no-normalize-select). - docs/examples.md: Add documentation and example initialization snippets for the new multi-language tags and popup demos, and move/update the note about Thai and Japanese combining marks. These additions ensure consistent behavior is demonstrated and tested for diacritic-insensitive vs exact matching across multiple scripts.
Issue number: resolves #279
What is the current behavior?
normalizeStringutility used the regex/[^\w]/gto strip non-word characters after NFD decomposition.\wcharacter class in JavaScript only matches[a-zA-Z0-9_], so every non-Latin script (Greek, Cyrillic, Vietnamese, Chinese, Japanese, Korean, Arabic, Thai, …) was treated as non-word and stripped entirely during normalization. This madesearchNormalize: truecompletely broken for non-Latin scripts — labels became empty strings, so nothing could be matched.What is the new behavior?
/[^\w]/gto/\p{M}/gu, which uses the Unicode property escape for combining marks (category M).Language coverage
searchNormalize: truenow works correctly for a single dropdown containing options across many writing systems:Crème brûlée,Niñocreme,ninoMünchen,Mädchen,KölnMunchen,Madchen,KolnßGrößeGrosseßis an atomic letter (no NFD decomposition)åÅlesundAlesundø,æBjørn,TromsøBjorn,Tromsoøandæare atomic lettersGöteborg,MalmöGoteborg,MalmoJyväskylä,HämeenlinnaJyvaskyla,HameenlinnaΈναΕναЁжик,ЙогуртЕжик,ИогуртViệt Nam,Hà NộiViet Nam,Ha Noiمُرَحَّباًمرحبا서울,한국어北京,你好東京,カタカナPerformance optimization
COMBINING_MARKS_REGEXconstant is now defined at module scope instead of being re-created insidenormalizeString()on every call./\p{M}/guinto a ~2 KB character-class regex. Re-compiling that pattern on each keystroke during search was unnecessarily expensive.Documentation and examples
searchNormalize: trueandsearchNormalize: falsevariants for direct comparison.docs/examples.md,docs/assets/script.js, and the table of contents.Tests
Added Cypress describe blocks with two comprehensive describes against the unified multi-language dropdown:
Multi-language search with searchNormalize: true(~25 specs) covers all listed scripts including positive cases (e.g.Munchen→München,Goteborg→Göteborg,Jyvaskyla→Jyväskylä,Ежик→Ёжик,Viet Nam→Việt Nam,مرحبا→مُرَحَّباً) and the documented atomic-letter limitations (Grossedoes NOT matchGröße;Bjorndoes NOT matchBjørn).Multi-language search with searchNormalize: falseverifies exact matches succeed (Greek/Cyrillic/Chinese/Japanese/Korean exact text) and that accent-stripped queries correctly find no options across all scripts.brulee→brûlée,cafe→café,nino→niño) preserved.Does this introduce a breaking change?
Validations
Ran regression scenarios in the documentation using the branch - ✅

Run automated tests - ✅