refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094

jahorton · 2025-11-04T16:52:31Z

🚧

This addition will allow us to clearly and cleanly indicate transforms that are two (or more) halves of the same original whole. It is notably more selective than just the original transition ID and is better suited for indicating split-transform cases.

Build-bot: skip build:web
Test-bot: skip

keymanapp-test-bot · 2025-11-04T16:52:36Z

User Test Results

Test specification and instructions

User tests are not required

jahorton · 2025-11-04T17:05:44Z

web/src/engine/predictive-text/worker-thread/src/main/correction/tokenization-subsets.ts

+  // check... this.
+


Suggested change

// check... this.

Oops.

This changes SearchPath to construct new instances whenever the path is extended, treating SearchPath as an immutable portion of the search graph that may be referenced by path extensions - new instances of SearchPath. This, in turn, removes the need to clone SearchPath instances when new input is received for an incoming token; the original path's represention will remain unchanged and may also be reused by a new instance extending the graph for the newly-received input. Relates-to: #14445 Build-bot: skip build:web Test-bot: skip

This new interface is being added in preparation for efficient multi-tokenization correction-search. SearchPath has been modified to implement it, and a new type (SearchCluster) will be added in the near future as an additional implementing type. Build-bot: skip build:web Test-bot: skip

When we start supporting more than one "space" for correction-searches, we may need to know which "space" (tokenization) a suggestion arose from. This way, we have a path forward for applying tokenization-dependent behaviors that may be required. Build-bot: skip build:web Test-bot: skip

…rvation transform

Upon investigation into the code being removed, this mostly triggered whenever an empty transform appeared in the input - during a context reset, at initialization, or when a deadkey is typed. (Deadkeys aren't sent to the pred-text engine.) This is generally not a common case within the engine, and there exists other filtering that helps prevent duplicating search results. Suppose a token that begins tagged with an empty transform. When the search performs an 'insertion' to better lookup words down a lexical path, the 'insertion' is identical whether before or after an empty transform. Similarly, insertions after a deletion appear no different than substitutions (or the same insertion before the deletion, then the deletion itself). The code likely didn't gain us much, as it likely carried some performance overhead - it built a large object that required memory and lookup time that involved constructing and processing strings for hashing. Build-bot: skip build:web Test-bot: skip

As an upcoming goal is to introduce a new SearchSpace type that will assist with context-caching across multiple tokenizaitons, it is wise to generalize SearchPath and functions utilizing it to accept any SearchSpace-implementing type as its parent. Build-bot: skip build:web Test-bot: skip

…le spaces Build-bot: skip build:web Test-bot: skip

…irectly

Once we start considering alternate tokenization schemes, we'll want to note how each potential input path aligns with the original input keystrokes. As there can be multiple paths to land within the same token, it's best to store this data on the SearchPath objects instead. This becomes especially relevant when considering token splits and merges, which will be the next follow-ups. Build-bot: skip build:web Test-bot: skip

…SearchSpace Build-bot: skip build:web Test-bot: skip

…tor/web/multi-space-correction-search

…or/web/manage-true-inputs-on-path

…web/relocate-source-range-key

…operties

This addition will allow us to clearly and cleanly indicate transforms that are two (or more) halves of the same original whole. It is notably more selective than just the original transition ID and is better suited for indicating split-transform cases. Build-bot: skip build:web Test-bot: skip

…input

…or/web/manage-true-inputs-on-path

…web/relocate-source-range-key

…/rename-inputsource-as-pathinputprops

…efactor/web/transform-tokenization-subset-ids

github-project-automation bot added this to Keyman Nov 4, 2025

github-project-automation bot moved this to Todo in Keyman Nov 4, 2025

keymanapp-test-bot bot added the epic-autocorrect label Nov 4, 2025

keymanapp-test-bot bot changed the title ~~refactor(web): adds unique identifier to transform-tokenization subsets~~ refactor(web): adds unique identifier to transform-tokenization subsets 🚂 Nov 4, 2025

github-actions bot added web/ web/predictive-text/ labels Nov 4, 2025

keymanapp-test-bot bot added this to the A19S15 milestone Nov 4, 2025

github-actions bot added the refactor label Nov 4, 2025

jahorton commented Nov 4, 2025

View reviewed changes

jahorton force-pushed the refactor/web/transform-tokenization-subset-ids branch 2 times, most recently from d13ebc8 to 1d113f5 Compare November 5, 2025 16:32

keyman-server modified the milestones: A19S15, A19S16 Nov 8, 2025

jahorton added 15 commits November 10, 2025 12:10

change(web): use search-space ID to find matching tokenization, prese…

d2d5c67

…rvation transform

change(web): add polish to utility method SearchPath.hasInputs()

7335de1

feat(web): start SearchPath unit testing

c28bb20

refactor(web): convert main correction-search method to accept multip…

f4ad85c

…le spaces Build-bot: skip build:web Test-bot: skip

change(web): rework suggestion-alignment helper to use tokenization d…

9507c71

…irectly

change(web): simplify SearchPath constructor use

6974322

fix(web): assert matching transition IDs in SearchPath constructor

21b8620

change(web): enhance SearchPath unit tests

3ab72c7

refactor(web): relocate definition of .sourceRangeKey to SearchPath, …

20c6f10

…SearchSpace Build-bot: skip build:web Test-bot: skip

jahorton force-pushed the refactor/web/complex-search-space-reuse branch from ca906dd to 316abe3 Compare November 10, 2025 20:51

jahorton force-pushed the refactor/web/transform-tokenization-subset-ids branch from 1d113f5 to e97c968 Compare November 10, 2025 21:02

jahorton changed the base branch from refactor/web/complex-search-space-reuse to refactor/web/relocate-source-range-key November 10, 2025 21:03

jahorton added 8 commits November 11, 2025 10:11

change(web): SearchPath field cleanup

87d386a

docs(web): add SearchSpace.parents documentation

d685e34

Merge branch 'refactor/web/use-interface-as-search-parent' into refac…

04071a1

…tor/web/multi-space-correction-search

Merge branch 'refactor/web/multi-space-correction-search' into refact…

7a82951

…or/web/manage-true-inputs-on-path

docs(web): document TokenInputSource and its members

31501bf

Merge branch 'refactor/web/manage-true-inputs-on-path' into refactor/…

3b45f3d

…web/relocate-source-range-key

change(web): renames and restructures TokenInputSource as PathInputPr…

ba33b9f

…operties

change(web): rename .sourceIdentifiers as .inputSegments

2d014a3

jahorton mentioned this pull request Nov 11, 2025

change(web): rename TokenInputSource as PathInputProperties + restructure it 🚂 #15140

Draft

jahorton force-pushed the refactor/web/transform-tokenization-subset-ids branch from e97c968 to 0792377 Compare November 11, 2025 21:41

jahorton added 5 commits November 12, 2025 13:41

fix(web): do not preserve taillessTrueKeystroke when receiving empty …

851fb77

…input

Merge branch 'refactor/web/multi-space-correction-search' into refact…

807c09d

…or/web/manage-true-inputs-on-path

Merge branch 'refactor/web/manage-true-inputs-on-path' into refactor/…

7831a6d

…web/relocate-source-range-key

Merge branch 'refactor/web/relocate-source-range-key' into change/web…

ef1b84a

…/rename-inputsource-as-pathinputprops

Merge branch 'change/web/rename-inputsource-as-pathinputprops' into r…

fb4b036

…efactor/web/transform-tokenization-subset-ids

keyman-server modified the milestones: A19S16, A19S17 Nov 22, 2025

keyman-server modified the milestones: A19S17, A19S18 Dec 6, 2025

keyman-server modified the milestones: A19S18, A19S19 Dec 21, 2025

keyman-server modified the milestones: A19S19, A19S20 Jan 3, 2026

jahorton force-pushed the refactor/web/relocate-source-range-key branch from 7831a6d to 346f737 Compare January 9, 2026 21:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094

refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094

jahorton commented Nov 4, 2025 •

edited

Loading

Uh oh!

keymanapp-test-bot bot commented Nov 4, 2025 •

edited

Loading

Uh oh!

jahorton Nov 4, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094

Are you sure you want to change the base?

refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094

Conversation

jahorton commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keymanapp-test-bot bot commented Nov 4, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

User Test Results

Uh oh!

jahorton Nov 4, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jahorton commented Nov 4, 2025 •

edited

Loading

keymanapp-test-bot bot commented Nov 4, 2025 •

edited

Loading