From f3789bbb3a35cbad282ca577714f4a66e179b75b Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Mon, 8 Dec 2025 16:18:19 -0600 Subject: [PATCH 01/11] docs(web): start alternate doc using different explanation strategy --- .../docs/correction-search-graph.md | 139 ++++++++++++++++++ 1 file changed, 139 insertions(+) create mode 100644 web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md new file mode 100644 index 00000000000..3f6da40100c --- /dev/null +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -0,0 +1,139 @@ + +# The Correction-Search dynamic graph search-space + +The Keyman predictive-text correction-search process is designed to consider all +of the most-likely possible input corrections when suggesting words from the +active lexical-model. To do so, it dynamically builds portions of the search +graph as needed to generate corrections to the most recent token in the context. +This token lies immediately before the caret. + +There is one major, notable simplifying assumption in the current +text-correction design: we assume that each keystroke's `Transform` is 100% +independent from the `Transform` selected for every other keystroke. This +assumption is, of course, invalid: the output of keystroke A may selectively +establish the context needed for a Keyman keyboard rule matched by one or more +keys in keystroke B. Efforts to address this limitation are considered +out-of-scope at this time and will be addressed later in a future epic - +epic/true-correction - documented as issue #14709. + +## Graph and Node Properties + +### The overall graph + +When viewed at low level, the graph generally takes on the form of a search +tree. There is a single, common root node, with no input received or processed. +Each possible corrective-edit and/or keystroke replacement edit acts as an edge; +this edge then leads to a new node which represents a correction prefix - the +full text produced by the selected keystrokes and edits traversed on the graph. + +In formal graph language, we should first note the following properties: +- The graph is directed: all keystrokes and edits happen in a specific ordering +- The graph is acyclic: there is no way to revisit a node more than once. + - Even _if_ text deletions occurred that restored the text itself to match + an earlier state, the new node would represent the additional keystrokes + and thus not match the corresponding earlier node. +- The graph may not be, strictly speaking, a tree. + - It is possible for the net result of two or more paths through the graph to + produce the same text output from the same set of keystrokes. + - The different paths may incur different costs. + - To be a tree requires that each node may only be connected to the root via + a single path. + +### The individual nodes + +The root node of the correction-search dynamic graph represents the empty token ``. +This token has no text content and represents no keystrokes. + +Other nodes are reached by treating possible outputs from incoming keystrokes as +edges on the graph. Noting the source `Transform` and its keystroke of origin, we +can generate a valid child node to represent the corresponding correction-search +prefix. + +#### Critical node properties + +1. As future incoming `Transform`s may include `.deleteLeft` components, it is +important to note the represented codepoint length of the prefix. +- It does not make sense to represent a node of negative length. Should this + result, we should throw away the token and start editing its predecessor + instead. +- A node of zero length may be considered to "throw away" and replace the + corresponding context token with a new empty token. + +2. As the whole point of correction-search is to generate valid corrections for +the text, it is important to remember the range of input represented by any +generated correction. + +- When the represented range matches that of the range represented by the +current active context text, corrections may be applied safely without +side-effects. + +- Should the represented range differ from the range represented by the current +active context text, corrections to other tokens may be required; we do not wish +to either forget or to duplicate portions of the user's input. + - E.g: if a whitespace typo occurs mid-word, the user's expected correction + might need to correct the current token, the whitespace's token, and the + prior piece of the word into a single token. + - The range for correction is determined by comparing the active context + token ranges with that of the correction's source, which may have never + added the wordbreak. + +## Correction-Search and Dynamic Programming + +With a few tweaks and restrictions, we can use dynamic programming techniques to +facilitate our correction-search processes. First, note the assumption stated +earlier: + +> [...] we assume that each keystroke's `Transform` is 100% independent from the +`Transform` selected for every other keystroke. + +Therefore, we can find the cost of selecting a correction by using a +[divide-and-conquer +strategy](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm) for the +correction-search path: +1. Find the cost of the path to which the incoming edit or keystroke + `Transform` will apply. +2. Modify by the cost of correcting the current keystroke with the specified + `Transform` or applying a keystroke-level edit. + +### Optimal Substructure + +Let us examine the effects of adding different types of keystroke edits to the +correction-search path. + +When adding an `insert` edit, we do not add data for an additional keystroke. +Instead, we look more deeply into the lexicon based on the current path's +prefix, extending the 'match' dimension of the search path with a cost penalty. +Thus, cost will only _increase_ for `insert` edit operations. + +When adding a `delete` edit, we do add data for an additional keystroke, but opt +not to include its `Transform` whatsoever. This extends the 'input' dimension +of the search path, paying a cost penality to do so. Thus, cost will only +_increase_ for `delete` edit operations. + +When adding a `match` or `substitute` edit without specified left-deletions, we +add data for an additional keystroke, including a corresponding `Transform` +while also looking more deeply into the lexicon in a manner that matches the +input. For this case, no cost penalty is incurred for the 'match' component +when for `match` edits, though one _is_ applied for `substitute` edits. A small +cost penalty corresponding to the selected `Transform`'s probability is applied +either way. **So long** as the applied 'input' `Transform` has no specified +left-deletions, the total cost will remain flat or increase - it will not +decrease. Both the 'input' and 'match' dimensions of the search path are +extended by `match` and `substitute` edit operations. + +When adding a `match` or `substitute` edit **with** specified left-deletions, it +is possible for a naive implementation to perform a reduction in total cost. +Deleting portions of the 'input' (and corresponding sections of the 'match' +dimension) will reduce the path to a simpler state, generally of lower cost. +This is the sole case that may currently invalidate the dynamic programming +requirement of "optimal substructure". (See #14366.) With further time +investment, we should be able to develop and implement a strategy to restore +this condition even for such cases. + +### Overlapping Subproblems + + + + +## < header goes here > + From 585e89196f3421ae4bfad08f68f51cef48d9a7ee Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Wed, 10 Dec 2025 15:50:36 -0600 Subject: [PATCH 02/11] docs(web): add doc updates --- .../docs/correction-search-graph.md | 217 +++++++++++++++--- 1 file changed, 190 insertions(+), 27 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 3f6da40100c..16cc08d96b1 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -1,5 +1,4 @@ - -# The Correction-Search dynamic graph search-space +# Correction-Search as Graph Path-Finding The Keyman predictive-text correction-search process is designed to consider all of the most-likely possible input corrections when suggesting words from the @@ -16,15 +15,38 @@ keys in keystroke B. Efforts to address this limitation are considered out-of-scope at this time and will be addressed later in a future epic - epic/true-correction - documented as issue #14709. -## Graph and Node Properties +## Graph Structure -### The overall graph +### The Root Node -When viewed at low level, the graph generally takes on the form of a search -tree. There is a single, common root node, with no input received or processed. -Each possible corrective-edit and/or keystroke replacement edit acts as an edge; -this edge then leads to a new node which represents a correction prefix - the -full text produced by the selected keystrokes and edits traversed on the graph. +When viewed at low level, the correction-search process generally takes on the +form of a search tree. There is a single, common root node, with no input +received or processed. The root node of the correction-search dynamic graph +represents the empty token ``. + +### Edges: Edits and Keystrokes + +Each possible corrective-edit and/or keystroke replacement edit acts as an +directed edge connecting an existing path (or the root node) to a node pairing +one potential edit sequence for correctable keystrokes to valid correction +targets found within the active lexical model data source. + +As each incoming keystroke may provide fat-finger alternative output +`Transform`s, each such `Transform` represents a group of edges on the search graph, +outbound from each node representing a possible accumulation of prior input. + +Suppose two keystrokes have been received for a word: +- Keystroke 1 may output any of `['a', 'e', 'i']`. +- Keystroke 2 may output any of `['t', 'n']`. + +For such a case, without loss of generality, there are three separate edges +corresponding to keystroke 2's `t` output: +- `'a' + 't'` => a node representing `'at'` +- `'e' + 't'` => a node representing `'et'` +- `'i' + 't'` => a node representing `'it`' + - A similar three exist for the `n` output. + +### Resulting Graph Properties In formal graph language, we should first note the following properties: - The graph is directed: all keystrokes and edits happen in a specific ordering @@ -39,20 +61,11 @@ In formal graph language, we should first note the following properties: - To be a tree requires that each node may only be connected to the root via a single path. -### The individual nodes - -The root node of the correction-search dynamic graph represents the empty token ``. -This token has no text content and represents no keystrokes. - -Other nodes are reached by treating possible outputs from incoming keystrokes as -edges on the graph. Noting the source `Transform` and its keystroke of origin, we -can generate a valid child node to represent the corresponding correction-search -prefix. - -#### Critical node properties +### Important Node Properties 1. As future incoming `Transform`s may include `.deleteLeft` components, it is important to note the represented codepoint length of the prefix. + - It does not make sense to represent a node of negative length. Should this result, we should throw away the token and start editing its predecessor instead. @@ -70,6 +83,7 @@ side-effects. - Should the represented range differ from the range represented by the current active context text, corrections to other tokens may be required; we do not wish to either forget or to duplicate portions of the user's input. + - E.g: if a whitespace typo occurs mid-word, the user's expected correction might need to correct the current token, the whitespace's token, and the prior piece of the word into a single token. @@ -77,6 +91,12 @@ to either forget or to duplicate portions of the user's input. token ranges with that of the correction's source, which may have never added the wordbreak. +3. For the purposes of correction-search, we do not need separate nodes to +distinguish between two different correction paths that result in the same net +effects... so long as the text results from the same applied user-input range. +What matters is that _a_ valid path to the correction exists. See ["Overlapping +Subproblems"](#overlapping-subproblems) below for further implications of this. + ## Correction-Search and Dynamic Programming With a few tweaks and restrictions, we can use dynamic programming techniques to @@ -90,6 +110,7 @@ Therefore, we can find the cost of selecting a correction by using a [divide-and-conquer strategy](https://en.wikipedia.org/wiki/Divide-and-conquer_algorithm) for the correction-search path: + 1. Find the cost of the path to which the incoming edit or keystroke `Transform` will apply. 2. Modify by the cost of correcting the current keystroke with the specified @@ -121,19 +142,161 @@ left-deletions, the total cost will remain flat or increase - it will not decrease. Both the 'input' and 'match' dimensions of the search path are extended by `match` and `substitute` edit operations. +#### Optimal Substructure with Deletes + When adding a `match` or `substitute` edit **with** specified left-deletions, it is possible for a naive implementation to perform a reduction in total cost. Deleting portions of the 'input' (and corresponding sections of the 'match' -dimension) will reduce the path to a simpler state, generally of lower cost. -This is the sole case that may currently invalidate the dynamic programming -requirement of "optimal substructure". (See #14366.) With further time -investment, we should be able to develop and implement a strategy to restore -this condition even for such cases. +dimension) will reduce the path to a simpler state - and generally speaking, to +one of lower cost. This is the sole case that may currently invalidate the +[dynamic programming requirement of "optimal +substructure"](https://en.wikipedia.org/wiki/Dynamic_programming#Computer_science). +(See #14366.) With further time investment, we should be able to develop and +implement a strategy to meet the conditions for optimal substructure even for +such cases. ### Overlapping Subproblems - +Looking at the correction-search graph naively, one may think that +correction-search can be fully handled by a divide-and-conquer approach. This +is sadly not the case; it is possible to reach the same intermediate node (see +point 3 under ["Important Node Properties"](#important-node-properties)) via +multiple paths. + +Consider the following case: +- Keystroke 1: produces a distribution with inserts ['t', 'th'] +- Keystroke 2: produces a distribution with inserts ['he', 'e'] + +Taking the first insert from each keystroke yields 'the', which matches the +result of taking the second insert from each. This yields two different paths +to the same node: +- Text: `the` +- Text length: 3 +- Built from the same portions of the same keystrokes - just different 'samples'. + +Any further correction prefixes built from either node don't care _which_ node +is the parent; the extended prefix will be built in exactly the same way, with +exactly the same increase in path cost. + +From a probabilistic / statistics standpoint, the true "correct" thing to do +would be to sum up all occurrences of "the same" node, resulting in a +higher-probability mass (and thus, a lower cost path to the singleton version of +the node). However, this is difficult to do computationally during an optimized +search. There is always the chance that another unreached variant of the path +exists. + +As we wish to find the highest probability (lowest cost) paths (correction +prefixes) and return them quickly and efficiently, a greedy approach - one in +which we _don't_ attempt to accumulate node probabilities - makes the process +far simpler - and _also_ meets the condition for [dynamic programming's +"overlapping subproblems" +constraint](https://en.wikipedia.org/wiki/Dynamic_programming#Computer_science). +When iterating through nodes from lowest-cost to highest cost, once any path to +a valid correction prefix is found, we can return it immediately. + +# The Dynamic Search-Graph + +In the sections above, the following mappings have been established: +- Prefixes for text correction/prediction can be mapped to graph nodes. +- Specific `Transform`s (from incoming keystroke `Transform` distributions) can + be mapped to graph edges. +- `insert` and `delete` edits can _also_ be mapped to graph edges. + +As established in the sections above, individual graph nodes uniquely represent +possible prefixes for text-corrections for specific input ranges. +- Once a lowest-cost path to a prefix has been found, we disregard other, + higher-cost paths that also arrive there. +- The same prefix may be supported by a different node when the portion of input + the second node represents differs. + +We have also established that the paths we obtain for valid corrections via +pathfinding on the search graph have overlapping subproblems and exhibit optimal +substructure (currently, with caveats). As a result, we can use dynamic +programming techniques to optimize correction-search. Once the [delete-left +issue](#optimal-substructure-with-deletes) is resolved, the caveats will +disappear, providing a truly optimal solution. + +The use of dynamic programming, along with the mappings and properties +established above, will win us the following benefits: +- The results of a correction-search may be _directly_ reused for future + searches after a new keystroke is added to the same correction-search space as + existing, pre-solved subproblems of the extended correction-search space. +- Keystrokes will always be processed - and processed in the same order they + were received. + - They may be "processed" via a `delete` edit. + +## Graph Topology + +Let us now more thoroughly examine the properties of valid search paths on the correction-search graph. + +Once a search path on the correction-search graph reaches a node, extensions of that path will never return to any of the following: +- the destination node's ancestors +- to nodes corresponding to any other `Transform` from keystrokes already + processed +- to nodes corresponding to edits applied before any keystrokes already + processed + +In formal graph language, we can use these properties to define subsets of nodes +as graph +[_modules_](https://en.wikipedia.org/wiki/Modular_decomposition#Modules). We +can then use these modules to define [quotient +graphs](https://en.wikipedia.org/wiki/Quotient_graph) for the correction-search +graph's nodes and edges in order to better clarify its structure. + + + +### Keystroke-Based Modules + +Let us start with a simplified case - one without 'insert' or 'delete' edits. +Instead, the only edges result from correcting keystrokes and matching them +against the lexicon. + +TODO: simplified keystroke-by-keystroke graph - single char outputs only + +Root node +-> keystroke 1 nodes (multiple, within same module) +-> keystroke 2 nodes (multiple, within same module) +-> keystroke 3 nodes (multiple, within same module), etc + +TODO: two-layer simplified keystroke-by-keystroke module graph: one + two char modules, grouping them based on final codepoint length +- may wish to do 'insert' edits at this level? +TODO: simplified keystroke-by-keystroke graph - one + two char modules (no longer showing inner nodes) & the variations in total codelength @ each layer +- may wish to do 'delete' edits at this level? + +### Placing Edit Operations + +TODO: show the general pattern for edit-operation support within each keystroke module. +- may need to additionally show how (insert) edits transition codepoint length +- may need to additionally show how (delete) edits transition keystroke, but not codepoint length + +# Correction-Search Implementation + +## The `SearchNode` Class + +The `SearchNode` class of the predictive-text engine represents one traversed path. + +- graph does not actually build nodes + - we keep 'em virtual +- SearchNode: + - traverses the path + - also represents the current path tail node / state + - helps resolve the "overlapping subproblems" aspect + -## < header goes here > +## The `SearchSpace` type + +Represents one of the modules defined above (codepoint length + keystrokes represented) + +Shift module definitions: now define modules for "from root through to keystroke K with codepoint length N" + +## The `SearchPath` type + +Represents the search-graph subspace corresponding to a single inbound quotient graph path to a single quotient graph module +- "inbound path" = single parent quotient-graph module (optimal subproblem) to the destination quotient-graph module + +## The `SearchCluster` type + +Represents the search-graph subspace corresponding to ALL inbound paths to a single quotient graph module +Is a superset of SearchPath. \ No newline at end of file From e119fe5004082674e9e49cd32ab9a6e515022971 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Thu, 11 Dec 2025 16:37:22 -0600 Subject: [PATCH 03/11] docs(web): adds initial quotient-graph visualizations --- .../docs/correction-search-graph.md | 243 +++++++++++++++++- 1 file changed, 231 insertions(+), 12 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 16cc08d96b1..9a58639dd54 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -251,23 +251,242 @@ Let us start with a simplified case - one without 'insert' or 'delete' edits. Instead, the only edges result from correcting keystrokes and matching them against the lexicon. -TODO: simplified keystroke-by-keystroke graph - single char outputs only - -Root node --> keystroke 1 nodes (multiple, within same module) --> keystroke 2 nodes (multiple, within same module) --> keystroke 3 nodes (multiple, within same module), etc - -TODO: two-layer simplified keystroke-by-keystroke module graph: one + two char modules, grouping them based on final codepoint length -- may wish to do 'insert' edits at this level? -TODO: simplified keystroke-by-keystroke graph - one + two char modules (no longer showing inner nodes) & the variations in total codelength @ each layer -- may wish to do 'delete' edits at this level? +For a first example, suppose we have the following scenario: +- Keystroke 1: outputs one of the following: + - `{insert: 'a', deleteLeft: 0}` + - `{insert: 'b', deleteLeft: 0}` +- Keystroke 2: outputs one of the following: + - `{insert: 'c', deleteLeft: 0}` + - `{insert: 'd', deleteLeft: 0}` +- Keystroke 3: outputs one of the following: + - `{insert: 'e', deleteLeft: 0}` + - `{insert: 'f', deleteLeft: 0}` + +Assuming that all possible combinations are valid prefixes, correction-search's +graph would then expand as follows: + +```mermaid +--- +title: Low-level Correction-Search graph expansion +--- +flowchart LR; + subgraph Start + start{Empty token} + end + + subgraph Keystroke 1: a or b + start --> a + start --> b + end + + subgraph Keystroke 2: c or d + a --> ac + a --> ad + b --> bc + b --> bd + end + + subgraph Keystroke 3: e or f + ac --> ace + ac --> acf + ad --> ade + ad --> adf + bc --> bce + bc --> bcf + bd --> bde + bd --> bdf + end +``` + +The figure above represents a [**quotient +graph**](https://en.wikipedia.org/wiki/Quotient_graph) of the search space for +this example case. +- Note how there is a clear ordering of events and how the correction-search +process goes through exactly four nodes in this scenario - the only point of +differentiation is _which four_. +- We know correction-search will go through up to one node from each column for +any path, and _exactly_ one for any completed path. + +Furthermore, each _column_ represents a [**modular +partition**](https://en.wikipedia.org/wiki/Modular_decomposition#Modular_quotients_and_factors) +of the graph. +- Each column, then, represents a graph + [**module**](https://en.wikipedia.org/wiki/Modular_decomposition#Modules) + while also being a partition of the graph. +- Note that for every node on the graph not in a module (column), each other + node _either_ connects to _all_ of that module's nodes or to _none_ of them. + +The graph can thus be condensed as follows: + +```mermaid +--- +title: Correction-Search graph expansion - Condensed +--- +flowchart LR; + start{Empty token} + start --> a(Keystroke 1: a or b) + a --> b(Keystroke 2: c or d) + b --> c(Keystroke 3: e or f) +``` + +### Handling Complex Transforms + +Let us now examine a case with a bit more complexity. Suppose we have the +following scenario: +- Keystroke 1: outputs one of the following: + - `{insert: 'a', deleteLeft: 0}` + - `{insert: 'b', deleteLeft: 0}` + - `{insert: 'cd', deleteLeft: 0}` +- Keystroke 2: outputs one of the following: + - `{insert: 'e', deleteLeft: 0}` + - `{insert: 'f', deleteLeft: 0}` + - `{insert: 'gh', deleteLeft: 0}` +- Keystroke 3: outputs one of the following: + - `{insert: 'i', deleteLeft: 0}` + - `{insert: 'jk', deleteLeft: 0}` + - `{insert: 'l', deleteLeft: 1}` + +Assuming that all possible combinations are valid prefixes, correction-search's +graph would then expand as follows: + +```mermaid +--- +title: Heterogenous keystroke correction-search graph (expanded) +config: + flowchart: + curve: basis +--- +flowchart LR; + subgraph Start + start{Empty token} + end + + subgraph After Key 1 + subgraph Codepoint length 1 + start --> a + start --> b + end + + subgraph Codepoint length 2 + start --> cd + end + end + + subgraph After Key 2 + subgraph Codepoint length 2 + a --> ae + a --> af + b --> be + b --> bf + end + + subgraph Codepoint length 3 + a --> agh + b --> bgh + cd --> cde + cd --> cdf + end + + subgraph Codepoint length 4 + cd --> cdgh + end + end + + subgraph After Key 3 + subgraph Codepoint length 2 + ae ----> al + af ----> al + be ----> bl + bf ----> bl + end + + subgraph Codepoint length 3 + ae ----> aei + af ----> afi + be ----> bei + bf ----> bfi + agh ----> agl + bgh ----> bgl + cde ----> cdl + cdf ----> cdl + end + + subgraph Codepoint length 4 + ae ----> aejk + af ----> afjk + be ----> bejk + bf ----> bfjk + agh ----> aghi + bgh ----> bghi + cde ----> cdei + cdf ----> cdfi + cdgh ----> cdgl + end + + subgraph Codepoint length 5 + agh ----> aghjk + bgh ----> bghjk + cde ----> cdejk + cdf ----> cdfjk + cdgh ---> cdghi + end + + subgraph Codepoint length 6 + cdgh ----> cdghjk + end + end +``` + +In its condensed view, we get... + + +```mermaid +--- +title: Heterogenous keystroke correction-search graph (condensed) +--- +flowchart LR; + subgraph Start + start{Empty token} + end + + subgraph After Key 1 + start --> K1C1(Codepoint length 1) + start --> K1C2(Codepoint length 2) + end + + subgraph After Key 2 + K1C1 -- [e, f] --> K2C2(Codepoint length 2) + + K1C1 -- [gh] --> K2C3(Codepoint length 3) + K1C2 -- [e, f] --> K2C3 + + K1C2 -- [gh] --> K2C4(Codepoint length 4) + end + + subgraph After Key 3 + K2C2 -- [ -1 + l ] --> K3C2(Codepoint length 2) + + K2C2 -- [i] --> K3C3(Codepoint length 3) + K2C3 -- [ -1 + l ] --> K3C3 + + K2C2 -- [jk] --> K3C4(Codepoint length 4) + K2C3 -- [i] --> K3C4 + K2C4 -- [ -1 + l ] --> K3C4 + + K2C3 -- [jk] --> K3C5(Codepoint length 5) + K2C4 -- [i] --> K3C5 + + K2C4 -- [jk] --> K3C6(Codepoint length 6) + end +``` ### Placing Edit Operations TODO: show the general pattern for edit-operation support within each keystroke module. -- may need to additionally show how (insert) edits transition codepoint length - may need to additionally show how (delete) edits transition keystroke, but not codepoint length + - is relatively straightforward: skip a key, but do not change codepoint length +- may need to additionally show how (insert) edits transition codepoint length + - TODO: FIX? May need to jump to a NEW module reflecting the new codepoint length. # Correction-Search Implementation From 7edf022dee20f1d42a91ec04ad95454cd87da8e5 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Fri, 12 Dec 2025 16:19:25 -0600 Subject: [PATCH 04/11] docs(web): adds 'insert', 'delete' quotient graphs --- .../docs/correction-search-graph.md | 124 ++++++++++++++++-- 1 file changed, 110 insertions(+), 14 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 9a58639dd54..9a0fd767588 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -450,8 +450,8 @@ flowchart LR; end subgraph After Key 1 - start --> K1C1(Codepoint length 1) - start --> K1C2(Codepoint length 2) + start -- [a, b] --> K1C1(Codepoint length 1) + start -- [cd] --> K1C2(Codepoint length 2) end subgraph After Key 2 @@ -480,13 +480,106 @@ flowchart LR; end ``` +Note that this graph _itself_ has an implied modular partition, with modules for +each keystroke containing submodules for each codepoint length resulting from +following the search path through to that node. + +We maintain the graph in this manner in order to properly handle left-deletions +for all cases. Should a later left-deletion erase _all_ of the search path's +codepoint length, or worse - go negative - there will be special handling +required. For cases where the left-deletions exceed currently-modeled codepoint +length, the most straightforward model for excess left-deletions is to edit and +correct text that lands before the caret after the final left-deletion is +applied. + ### Placing Edit Operations -TODO: show the general pattern for edit-operation support within each keystroke module. -- may need to additionally show how (delete) edits transition keystroke, but not codepoint length - - is relatively straightforward: skip a key, but do not change codepoint length -- may need to additionally show how (insert) edits transition codepoint length - - TODO: FIX? May need to jump to a NEW module reflecting the new codepoint length. +`insert` edits increase the codepoint length of the represented path, but do not +include data from extra keystrokes. Thus, the search-graph with `insert` edits, +for the first two keys, may be visualized as follows: + + + +```mermaid +--- +title: Heterogenous keystroke correction-search graph (with 'insert' edits) +config: + flowchart: + htmlLabels: false +--- +flowchart TD; + subgraph Start + start{Empty token} -- _insert_ --> SC1(Codepoint length 1) + SC1 -- _insert_ --> SC2@{ shape: processes, label: "..." } + end + + subgraph After Key 1 + start -- [a, b] --> K1C1(Codepoint length 1) + start -- [cd] --> K1C2(Codepoint length 2) + K1C1 -- _insert_ --> K1C2(Codepoint length 2) + SC1 -- [a, b] --> K1C2 + SC1 -- [cd] --> K1C3(Codepoint length 3) + K1C2 -- _insert_ --> K1C3 + K1C3 -- _insert_ --> K1C4@{ shape: processes, label: "..." } + end + + subgraph After Key 2 + K1C1 -- [e, f] --> K2C2(Codepoint length 2) + + K1C1 -- [gh] --> K2C3(Codepoint length 3) + K2C2 -- _insert_ --> K2C3 + + K1C2 -- [e, f] --> K2C3 + + K1C2 -- [gh] --> K2C4(Codepoint length 4) + K2C3 -- _insert_ --> K2C4 + K1C3 -- [e, f] --> K2C4 + + K1C3 -- [gh] --> K2C5(Codepoint length 5) + K2C4 -- _insert_ --> K2C5 + + K2C5 -- _insert_ --> K2C6@{ shape: processes, label: "..." } + end +``` + +`delete` edits increase the range of represented keystrokes, but do not increase +the codepoint length of resulting suggestions. Thus, the search-graph with +`delete` edits, for the first two keys, may be visualized as follows: + +```mermaid +--- +title: Heterogenous keystroke correction-search graph (with 'delete' edits) +config: + flowchart: + htmlLabels: false +--- +flowchart LR; + subgraph Start + start{Empty token} + end + + subgraph After Key 1 + start -- _delete_ --> K1C0(Codepoint length 0) + start -- [a, b] --> K1C1(Codepoint length 1) + start -- [cd] --> K1C2(Codepoint length 2) + end + + subgraph After Key 2 + K1C0 -- _delete_ --> K2C0(Codepoint length 0) + K1C0 -- [e, f] --> K2C1(Codepoint length 1) + K1C0 -- [gh] --> K2C2(Codepoint length 2) + + K1C1 -- _delete_ --> K2C1 + K1C1 -- [e, f] --> K2C2 + K1C1 -- [gh] --> K2C3(Codepoint length 3) + + K1C2 -- _delete_ --> K2C2 + K1C2 -- [e, f] --> K2C3 + K1C2 -- [gh] --> K2C4(Codepoint length 4) + end +``` + + # Correction-Search Implementation @@ -501,21 +594,24 @@ The `SearchNode` class of the predictive-text engine represents one traversed pa - also represents the current path tail node / state - helps resolve the "overlapping subproblems" aspect - - ## The `SearchSpace` type -Represents one of the modules defined above (codepoint length + keystrokes represented) +Represents one of the modules defined above (codepoint length + keystrokes +represented) -Shift module definitions: now define modules for "from root through to keystroke K with codepoint length N" +Shift module definitions: now define modules for "from root through to +keystroke K with codepoint length N" ## The `SearchPath` type -Represents the search-graph subspace corresponding to a single inbound quotient graph path to a single quotient graph module -- "inbound path" = single parent quotient-graph module (optimal subproblem) to the destination quotient-graph module +Represents the search-graph subspace corresponding to a single inbound quotient +graph path to a single quotient graph module +- "inbound path" = single parent quotient-graph module (optimal subproblem) to + the destination quotient-graph module ## The `SearchCluster` type -Represents the search-graph subspace corresponding to ALL inbound paths to a single quotient graph module +Represents the search-graph subspace corresponding to ALL inbound paths to a +single quotient graph module Is a superset of SearchPath. \ No newline at end of file From 8b3356bfb4ed9e9bde64719fcc6f7b1060f57aa0 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Mon, 15 Dec 2025 14:47:06 -0600 Subject: [PATCH 05/11] docs(web): completes first draft of the alternate approach for correction-search documentation --- .../docs/correction-search-graph.md | 222 +++++++++++++++--- 1 file changed, 195 insertions(+), 27 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 9a0fd767588..45823901872 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -437,8 +437,12 @@ flowchart LR; end ``` -In its condensed view, we get... +Note that each member of the "keystroke count" set of modules (i.e, each column) +is comprised of one or more sets of entries of specific codepoint lengths. It +is reasonable to consider each such subset (of equal codepoint length + +processed keystroke count) as its own module. +In this graph's condensed view, we get... ```mermaid --- @@ -480,9 +484,12 @@ flowchart LR; end ``` -Note that this graph _itself_ has an implied modular partition, with modules for -each keystroke containing submodules for each codepoint length resulting from -following the search path through to that node. +Note that this quotient graph has an implied modular partition, with modules for +each keystroke containing (condensed) submodules for each codepoint length +resulting from following the search path through to that node. These condensed +submodules may represent multiple different internal nodes, each reachable by +slightly different paths that all exhibit the same critical qualities: they +produce the same codepoint length with the same set of processed keystrokes. We maintain the graph in this manner in order to properly handle left-deletions for all cases. Should a later left-deletion erase _all_ of the search path's @@ -498,11 +505,12 @@ applied. include data from extra keystrokes. Thus, the search-graph with `insert` edits, for the first two keys, may be visualized as follows: - + ```mermaid --- -title: Heterogenous keystroke correction-search graph (with 'insert' edits) +title: Condensed graph with 'insert' edits config: flowchart: htmlLabels: false @@ -548,7 +556,7 @@ the codepoint length of resulting suggestions. Thus, the search-graph with ```mermaid --- -title: Heterogenous keystroke correction-search graph (with 'delete' edits) +title: Condensed graph with 'delete' edits config: flowchart: htmlLabels: false @@ -579,39 +587,199 @@ flowchart LR; end ``` - - # Correction-Search Implementation ## The `SearchNode` Class -The `SearchNode` class of the predictive-text engine represents one traversed path. +The `SearchNode` class of the predictive-text engine represents the progress +taken on one path through the correction-search graph's nodes and edges. To be +clear, this is on the _expanded_ path, thus on the level of individual nodes and +edges that are implied by the condensed versions of the graph. When starting +the correction-search for a new token, a `SearchNode` representing an empty-text +correction root, with no contributing keystrokes, is constructed. + +The `SearchNode` class provides the following methods that may be used to +traverse graph edges and extend the search path during the correction-search +process: +- `buildDeletionEdges()` +- `buildInsertionEdges()` +- `buildSubstitutionEdges()` + +Instances of `SearchNode` that result from the use of the methods above +represent the complete path taken to reach the _expanded_ graph node they +represent _and_ the node itself. As it is possible for the node to be reached +by different paths, the `.resultKey` property may be used to determine if this +has occurred at a lower path cost. Should this occur, the instance may be +discarded, as the optimal version has already been evaluated. + +`SearchNode` also provides the property `.currentCost`, which calculates the +cost of the path traversed to reach it via the necessary sequence of calls to +the methods above. This may then be used to queue all nodes for a classical +graph search (such as [Dijkstra's +algorithm](https://en.wikipedia.org/wiki/Dijkstra%27s_algorithm)) via priority +queue. + +The trick, then, is to track what each `SearchNode` represents and to ensure +that the methods above are utilized correctly when expanding the search graph. +For this, we need a way to track each node's "path state" during the search +process. To do so, we turn to the quotient-graph representations above and +leverage each node's association with specific modules. -- graph does not actually build nodes - - we keep 'em virtual -- SearchNode: - - traverses the path - - also represents the current path tail node / state - - helps resolve the "overlapping subproblems" aspect + ## The `SearchSpace` type -Represents one of the modules defined above (codepoint length + keystrokes -represented) +Let us examine the manner in which correction-search nodes are visited and +search paths are built: -Shift module definitions: now define modules for "from root through to -keystroke K with codepoint length N" +```mermaid +--- +title: Condensed correction-search graph with edit operations +--- +flowchart LR; + subgraph Start + start{Empty token} -- _insert_ --> SC1(SC1: Codepoint length 1) + SC1 -- _insert_ --> SC2@{ shape: processes, label: "..." } + end + + subgraph After Key 1 + start -- _delete_ --> K1C0(K1C0: Codepoint length 0) + + SC1 -- _delete_ --> K1C1 + start -- [a, b] --> K1C1(K1C1: Codepoint length 1) + K1C0 -- _insert_ --> K1C1 + + SC1 -- [a, b] --> K1C2 + start -- [cd] --> K1C2(K1C2: Codepoint length 2) + K1C1 -- _insert_ --> K1C2 + SC2 -.-> K1C2 + + + SC1 -- [cd] --> K1C3(K1C3: Codepoint length 3) + K1C2 -- _insert_ --> K1C3 + SC2 -.-> K1C3 + + K1C3 -- _insert_ --> K1C4@{ shape: processes, label: "..." } + SC2 -.-> K1C4 + end + + subgraph After Key 2 + K1C0 -- _delete_ --> K2C0(K2C0: Codepoint length 0) + + K1C1 -- _delete_ --> K2C1 + K1C0 -- [e, f] --> K2C1(K2C1: Codepoint length 1) + K2C0 -- _insert_ --> K2C1 + + K1C2 -- _delete_ --> K2C2 + K1C1 -- [e, f] --> K2C2 + K1C0 -- [gh] --> K2C2(K2C2: Codepoint length 2) + K2C1 -- _insert_ --> K2C2 + + K1C3 -- _delete_ --> K2C3 + K1C2 -- [e, f] --> K2C3 + K1C1 -- [gh] --> K2C3(K2C3: Codepoint length 3) + K2C2 -- _insert_ --> K2C3 + + K1C4 -- _delete_ --> K2C4 + K1C3 -- [e, f] --> K2C4 + K1C2 -- [gh] --> K2C4(K2C4: Codepoint length 4) + K2C3 -- _insert_ --> K2C4 + K1C4 -.-> K2C4 + + K1C3 -- [gh] --> K2C5(K2C5: Codepoint length 5) + K2C4 -- _insert_ --> K2C5 + K1C4 -.-> K2C5 + + K2C5 -- _insert_ --> K2C6@{ shape: processes, label: "..." } + K1C4 -.-> K2C6 + end +``` + +As a search path is extended by the operations above, its evaluated state will +correspond to one of the submodules in the graph above. That submodule then +determines which operations and keystroke data may be used to further extend the +search path in order to yield a more complete correction-search search-path +result. + +As the way paths are extended is dependent upon which submodule contains their +final node, the `SearchSpace` interface exists to model individual submodules, +manage the extension of paths that pass through them, and cache intermediate +calculations for future reuse. ## The `SearchPath` type -Represents the search-graph subspace corresponding to a single inbound quotient -graph path to a single quotient graph module -- "inbound path" = single parent quotient-graph module (optimal subproblem) to - the destination quotient-graph module +The transition from one submodule to another is marked by specific edge types +corresponding to received keystrokes or to `insert` or `delete` edit operations. +Whatever the edge type is, this transition is modeled by the `SearchPath` type, +extending all `SearchNode` paths passing through it via the specified edge type +in order to reach the next submodule. + +`SearchPath` itself _also_ implements `SearchSpace`; for cases where only a +single parent node and edge exists that may transition to a new submodule, +`SearchPath` is sufficient to module the destination submodule. ## The `SearchCluster` type -Represents the search-graph subspace corresponding to ALL inbound paths to a -single quotient graph module +As there are many cases where more than one parent node + edge combination may +transition to a child submodule, `SearchCluster` exists to connect all such +combinations (and their corresponding `SearchPath` representations) together. +It then exposes them as a single common instance to represent the destination +submodule. + +Revisiting the prior quotient graph visualization, the following graph +represents the full range of representation for submodule `K2C3` as a +`SearchCluster`: + +```mermaid +--- +title: Submodule inspection: all quotient-graph paths to K2C3 +--- +flowchart LR; + subgraph Start + start{Empty token} -- _insert_ --> SC1(SC1: Codepoint length 1) + SC1 -- _insert_ --> SC2@{ shape: processes, label: "..." } + end + + subgraph After Key 1 + start -- _delete_ --> K1C0(K1C0: Codepoint length 0) + + SC1 -- _delete_ --> K1C1 + start -- [a, b] --> K1C1(K1C1: Codepoint length 1) + K1C0 -- _insert_ --> K1C1 + + SC1 -- [a, b] --> K1C2 + start -- [cd] --> K1C2(K1C2: Codepoint length 2) + K1C1 -- _insert_ --> K1C2 + SC2 -.-> K1C2 + + + SC1 -- [cd] --> K1C3(K1C3: Codepoint length 3) + K1C2 -- _insert_ --> K1C3 + SC2 -.-> K1C3 + end + + subgraph After Key 2 + K1C0 -- _delete_ --> K2C0(K2C0: Codepoint length 0) + + K1C1 -- _delete_ --> K2C1 + K1C0 -- [e, f] --> K2C1(K2C1: Codepoint length 1) + K2C0 -- _insert_ --> K2C1 + + K1C2 -- _delete_ --> K2C2 + K1C1 -- [e, f] --> K2C2 + K1C0 -- [gh] --> K2C2(K2C2: Codepoint length 2) + K2C1 -- _insert_ --> K2C2 + + K1C3 -- _delete_ --> K2C3 + K1C2 -- [e, f] --> K2C3 + K1C1 -- [gh] --> K2C3(K2C3: Codepoint length 3) + K2C2 -- _insert_ --> K2C3 + end +``` -Is a superset of SearchPath. \ No newline at end of file +Each of the inbound paths to the final quotient-path graph node for the K2C3 +submodule may each individually be modeled as `SearchPath`s. Each such +`SearchPath` has a single parent submodule, represented by an earlier +`SearchSpace` instance, whose paths are extended by an edge representing a +single keystroke input type (all with matching insertion codepoint length and +left-deletion count) or edit operation type. \ No newline at end of file From ca96212ab3340274074e620a9789468526182b2e Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Mon, 15 Dec 2025 14:53:47 -0600 Subject: [PATCH 06/11] docs(web): enhance left-deletion doc comment --- .../docs/correction-search-graph.md | 16 ++++++++++------ 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 45823901872..63733f91160 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -492,12 +492,16 @@ slightly different paths that all exhibit the same critical qualities: they produce the same codepoint length with the same set of processed keystrokes. We maintain the graph in this manner in order to properly handle left-deletions -for all cases. Should a later left-deletion erase _all_ of the search path's -codepoint length, or worse - go negative - there will be special handling -required. For cases where the left-deletions exceed currently-modeled codepoint -length, the most straightforward model for excess left-deletions is to edit and -correct text that lands before the caret after the final left-deletion is -applied. +for all cases. If any input keystrokes include left-deletion effects, it is +possible to have paths that _decrease_ the total represented codepoint length. + +Of particular note: should a later left-deletion eventually erase _all_ of the +search path's codepoint length, or worse - go negative - there will be special +handling required. (This is the specific reason that the submodules require +matching codepoint lengths.) For cases where the left-deletions exceed +currently-modeled codepoint length, the most straightforward model for excess +left-deletions is to edit and correct text that lands before the caret after the +final left-deletion is applied. ### Placing Edit Operations From dc1cd3b79234e8daba4af915c995f6249d6cf46e Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Mon, 15 Dec 2025 15:15:36 -0600 Subject: [PATCH 07/11] docs(web): fix graph title --- .../worker-thread/docs/correction-search-graph.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 63733f91160..05f5962ae17 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -736,7 +736,7 @@ represents the full range of representation for submodule `K2C3` as a ```mermaid --- -title: Submodule inspection: all quotient-graph paths to K2C3 +title: Submodule inspection - all quotient-graph paths to K2C3 --- flowchart LR; subgraph Start From be59706d379e114d44633c2944c2b523b1f317b3 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Fri, 9 Jan 2026 09:17:12 -0600 Subject: [PATCH 08/11] docs(web): update graph-doc names for upcoming classes --- .../docs/correction-search-graph.md | 39 ++++++++++--------- 1 file changed, 20 insertions(+), 19 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 05f5962ae17..d1e3928a4ae 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -631,7 +631,7 @@ leverage each node's association with specific modules. -## The `SearchSpace` type +## The `SearchQuotientNode` type Let us examine the manner in which correction-search nodes are visited and search paths are built: @@ -706,33 +706,34 @@ search path in order to yield a more complete correction-search search-path result. As the way paths are extended is dependent upon which submodule contains their -final node, the `SearchSpace` interface exists to model individual submodules, +final node, the `SearchQuotientNode` interface exists to model individual submodules, manage the extension of paths that pass through them, and cache intermediate calculations for future reuse. -## The `SearchPath` type +## The `SearchQuotientSpur` type The transition from one submodule to another is marked by specific edge types corresponding to received keystrokes or to `insert` or `delete` edit operations. -Whatever the edge type is, this transition is modeled by the `SearchPath` type, -extending all `SearchNode` paths passing through it via the specified edge type -in order to reach the next submodule. +Whatever the edge type is, this transition is modeled by the +`SearchQuotientSpur` type, extending all `SearchNode` paths passing through it +via the specified edge type in order to reach the next submodule. -`SearchPath` itself _also_ implements `SearchSpace`; for cases where only a -single parent node and edge exists that may transition to a new submodule, -`SearchPath` is sufficient to module the destination submodule. +`SearchQuotientSpur` itself _also_ implements `SearchQuotientNode`; for cases +where only a single parent node and edge exists that may transition to a new +submodule, `SearchQuotientSpur` is sufficient to module the destination +submodule. -## The `SearchCluster` type +## The `SearchQuotientCluster` type As there are many cases where more than one parent node + edge combination may -transition to a child submodule, `SearchCluster` exists to connect all such -combinations (and their corresponding `SearchPath` representations) together. -It then exposes them as a single common instance to represent the destination -submodule. +transition to a child submodule, `SearchQuotientCluster` exists to connect all such +combinations (and their corresponding `SearchQuotientSpur` representations) +together. It then exposes them as a single common instance to represent the +destination submodule. Revisiting the prior quotient graph visualization, the following graph represents the full range of representation for submodule `K2C3` as a -`SearchCluster`: +`SearchQuotientCluster`: ```mermaid --- @@ -782,8 +783,8 @@ flowchart LR; ``` Each of the inbound paths to the final quotient-path graph node for the K2C3 -submodule may each individually be modeled as `SearchPath`s. Each such -`SearchPath` has a single parent submodule, represented by an earlier -`SearchSpace` instance, whose paths are extended by an edge representing a -single keystroke input type (all with matching insertion codepoint length and +submodule may each individually be modeled as `SearchQuotientSpur`s. Each such +`SearchQuotientSpur` has a single parent submodule, represented by an earlier +`SearchQuotientNode` instance, whose paths are extended by an edge representing +a single keystroke input type (all with matching insertion codepoint length and left-deletion count) or edit operation type. \ No newline at end of file From d493f607c2536a465f40418ab8863ef8bfc55a46 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Fri, 9 Jan 2026 09:36:50 -0600 Subject: [PATCH 09/11] docs(web): address review comments --- .../docs/correction-search-graph.md | 45 ++++++++++++------- 1 file changed, 29 insertions(+), 16 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index d1e3928a4ae..84f951d495e 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -4,7 +4,7 @@ The Keyman predictive-text correction-search process is designed to consider all of the most-likely possible input corrections when suggesting words from the active lexical-model. To do so, it dynamically builds portions of the search graph as needed to generate corrections to the most recent token in the context. -This token lies immediately before the caret. +This token lies immediately before the text insertion point (caret). There is one major, notable simplifying assumption in the current text-correction design: we assume that each keystroke's `Transform` is 100% @@ -116,6 +116,18 @@ correction-search path: 2. Modify by the cost of correcting the current keystroke with the specified `Transform` or applying a keystroke-level edit. +Cost is comprised of two main components: +1. Edit-distance cost - the cost of: + - inserting characters not typed + - deleting keystrokes that should not have been typed + - substituting one output character with another + - Each of these three versions are of constant, equal cost per instance. +2. Fat-finger cost: the cost of selecting one variant of keystroke output from + the input + - Here, the cost is the negative log-probability of the keystroke. + - Note that any keystroke output with higher cost than the edit-distance + cost will not be considered. + ### Optimal Substructure Let us examine the effects of adding different types of keystroke edits to the @@ -741,50 +753,51 @@ title: Submodule inspection - all quotient-graph paths to K2C3 --- flowchart LR; subgraph Start - start{Empty token} -- _insert_ --> SC1(SC1: Codepoint length 1) + start{Empty token} -- _insert_ --> SC1(Codepoint length 1) SC1 -- _insert_ --> SC2@{ shape: processes, label: "..." } end subgraph After Key 1 - start -- _delete_ --> K1C0(K1C0: Codepoint length 0) + start -- _delete_ --> K1C0(Codepoint length 0) SC1 -- _delete_ --> K1C1 - start -- [a, b] --> K1C1(K1C1: Codepoint length 1) + start -- [a, b] --> K1C1(Codepoint length 1) K1C0 -- _insert_ --> K1C1 SC1 -- [a, b] --> K1C2 - start -- [cd] --> K1C2(K1C2: Codepoint length 2) + start -- [cd] --> K1C2(Codepoint length 2) K1C1 -- _insert_ --> K1C2 SC2 -.-> K1C2 - SC1 -- [cd] --> K1C3(K1C3: Codepoint length 3) + SC1 -- [cd] --> K1C3(Codepoint length 3) K1C2 -- _insert_ --> K1C3 SC2 -.-> K1C3 end subgraph After Key 2 - K1C0 -- _delete_ --> K2C0(K2C0: Codepoint length 0) + K1C0 -- _delete_ --> K2C0(Codepoint length 0) K1C1 -- _delete_ --> K2C1 - K1C0 -- [e, f] --> K2C1(K2C1: Codepoint length 1) + K1C0 -- [e, f] --> K2C1(Codepoint length 1) K2C0 -- _insert_ --> K2C1 K1C2 -- _delete_ --> K2C2 K1C1 -- [e, f] --> K2C2 - K1C0 -- [gh] --> K2C2(K2C2: Codepoint length 2) + K1C0 -- [gh] --> K2C2(Codepoint length 2) K2C1 -- _insert_ --> K2C2 K1C3 -- _delete_ --> K2C3 K1C2 -- [e, f] --> K2C3 - K1C1 -- [gh] --> K2C3(K2C3: Codepoint length 3) + K1C1 -- [gh] --> K2C3(Destination: Codepoint length 3) K2C2 -- _insert_ --> K2C3 end ``` -Each of the inbound paths to the final quotient-path graph node for the K2C3 -submodule may each individually be modeled as `SearchQuotientSpur`s. Each such -`SearchQuotientSpur` has a single parent submodule, represented by an earlier -`SearchQuotientNode` instance, whose paths are extended by an edge representing -a single keystroke input type (all with matching insertion codepoint length and -left-deletion count) or edit operation type. \ No newline at end of file +Each of the inbound paths to the final quotient-path graph node for the +submodule labeled "Destination" may each individually be modeled as +`SearchQuotientSpur`s. Each such `SearchQuotientSpur` has a single parent +submodule, represented by an earlier `SearchQuotientNode` instance, whose paths +are extended by an edge representing a single keystroke input type (all with +matching insertion codepoint length and left-deletion count) or edit operation +type. \ No newline at end of file From ac68b5de92e3d4b6432e73f871a42c6d777b98cf Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Fri, 9 Jan 2026 09:37:53 -0600 Subject: [PATCH 10/11] docs(web): remove excess TODO --- .../worker-thread/docs/correction-search-graph.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 84f951d495e..9d46846923c 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -641,8 +641,6 @@ For this, we need a way to track each node's "path state" during the search process. To do so, we turn to the quotient-graph representations above and leverage each node's association with specific modules. - - ## The `SearchQuotientNode` type Let us examine the manner in which correction-search nodes are visited and From 1bbbd6f79cdbb95a82c614542be822120d183938 Mon Sep 17 00:00:00 2001 From: Joshua Horton Date: Fri, 9 Jan 2026 09:54:51 -0600 Subject: [PATCH 11/11] docs(web): fix typo --- .../worker-thread/docs/correction-search-graph.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md index 9d46846923c..4708dc06b21 100644 --- a/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md +++ b/web/src/engine/predictive-text/worker-thread/docs/correction-search-graph.md @@ -730,7 +730,7 @@ via the specified edge type in order to reach the next submodule. `SearchQuotientSpur` itself _also_ implements `SearchQuotientNode`; for cases where only a single parent node and edge exists that may transition to a new -submodule, `SearchQuotientSpur` is sufficient to module the destination +submodule, `SearchQuotientSpur` is sufficient to model the destination submodule. ## The `SearchQuotientCluster` type