-
-
Notifications
You must be signed in to change notification settings - Fork 130
refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
jahorton
wants to merge
29
commits into
refactor/web/relocate-source-range-key
Choose a base branch
from
refactor/web/transform-tokenization-subset-ids
base: refactor/web/relocate-source-range-key
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
refactor(web): adds unique identifier to transform-tokenization subsets 🚂 #15094
jahorton
wants to merge
29
commits into
refactor/web/relocate-source-range-key
from
refactor/web/transform-tokenization-subset-ids
+1,927
−821
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
User Test ResultsTest specification and instructions User tests are not required |
jahorton
commented
Nov 4, 2025
Comment on lines
173
to
174
| // check... this. | ||
|
|
Contributor
Author
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggested change
| // check... this. |
Oops.
d13ebc8 to
1d113f5
Compare
This changes SearchPath to construct new instances whenever the path is extended, treating SearchPath as an immutable portion of the search graph that may be referenced by path extensions - new instances of SearchPath. This, in turn, removes the need to clone SearchPath instances when new input is received for an incoming token; the original path's represention will remain unchanged and may also be reused by a new instance extending the graph for the newly-received input. Relates-to: #14445 Build-bot: skip build:web Test-bot: skip
This new interface is being added in preparation for efficient multi-tokenization correction-search. SearchPath has been modified to implement it, and a new type (SearchCluster) will be added in the near future as an additional implementing type. Build-bot: skip build:web Test-bot: skip
When we start supporting more than one "space" for correction-searches, we may need to know which "space" (tokenization) a suggestion arose from. This way, we have a path forward for applying tokenization-dependent behaviors that may be required. Build-bot: skip build:web Test-bot: skip
…rvation transform
Upon investigation into the code being removed, this mostly triggered whenever an empty transform appeared in the input - during a context reset, at initialization, or when a deadkey is typed. (Deadkeys aren't sent to the pred-text engine.) This is generally not a common case within the engine, and there exists other filtering that helps prevent duplicating search results. Suppose a token that begins tagged with an empty transform. When the search performs an 'insertion' to better lookup words down a lexical path, the 'insertion' is identical whether before or after an empty transform. Similarly, insertions after a deletion appear no different than substitutions (or the same insertion before the deletion, then the deletion itself). The code likely didn't gain us much, as it likely carried some performance overhead - it built a large object that required memory and lookup time that involved constructing and processing strings for hashing. Build-bot: skip build:web Test-bot: skip
As an upcoming goal is to introduce a new SearchSpace type that will assist with context-caching across multiple tokenizaitons, it is wise to generalize SearchPath and functions utilizing it to accept any SearchSpace-implementing type as its parent. Build-bot: skip build:web Test-bot: skip
…le spaces Build-bot: skip build:web Test-bot: skip
Once we start considering alternate tokenization schemes, we'll want to note how each potential input path aligns with the original input keystrokes. As there can be multiple paths to land within the same token, it's best to store this data on the SearchPath objects instead. This becomes especially relevant when considering token splits and merges, which will be the next follow-ups. Build-bot: skip build:web Test-bot: skip
…SearchSpace Build-bot: skip build:web Test-bot: skip
ca906dd to
316abe3
Compare
1d113f5 to
e97c968
Compare
…tor/web/multi-space-correction-search
…or/web/manage-true-inputs-on-path
…web/relocate-source-range-key
This addition will allow us to clearly and cleanly indicate transforms that are two (or more) halves of the same original whole. It is notably more selective than just the original transition ID and is better suited for indicating split-transform cases. Build-bot: skip build:web Test-bot: skip
e97c968 to
0792377
Compare
…or/web/manage-true-inputs-on-path
…web/relocate-source-range-key
…/rename-inputsource-as-pathinputprops
…efactor/web/transform-tokenization-subset-ids
7831a6d to
346f737
Compare
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
🚧
This addition will allow us to clearly and cleanly indicate transforms that are two (or more) halves of the same original whole. It is notably more selective than just the original transition ID and is better suited for indicating split-transform cases.
Build-bot: skip build:web
Test-bot: skip