-
-
Notifications
You must be signed in to change notification settings - Fork 138
feat(web): implement tokenization convergence and associated unit test 🚂 #15834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: refactor/web/split-analyze-transition
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -9,7 +9,9 @@ | |
|
|
||
| import { LexicalModelTypes } from '@keymanapp/common-types'; | ||
|
|
||
| import { ContextToken } from './context-token.js'; | ||
| import { ContextTokenization } from './context-tokenization.js'; | ||
| import { SearchQuotientCluster } from './search-quotient-cluster.js'; | ||
| import { legacySubsetKeyer, TokenizationSubset, TokenizationSubsetBuilder } from './tokenization-subsets.js'; | ||
|
|
||
| import Distribution = LexicalModelTypes.Distribution; | ||
|
|
@@ -80,11 +82,9 @@ export function precomputeTransitions( | |
| /** | ||
| * Given results from `precomputeTransitions`, this function generates the | ||
| * context variations that should result from the current context variants and | ||
| * input. The one best matching the user's visible context will be set at index | ||
| * 0. | ||
| * input. | ||
| * @param precomputedTransitionSubsets | ||
| * @param transformDistribution | ||
| * @param keyMatchingUserContext | ||
| * @returns | ||
| */ | ||
| export function transitionTokenizations( | ||
|
|
@@ -124,10 +124,61 @@ export function transitionTokenizations( | |
| return; | ||
| } | ||
|
|
||
| // Of particular note - there may no longer be a specific, single set of edits | ||
| // for the transition; there will be different paths to reach a tokenization! | ||
| throw new Error("Multi-tokenization transitions not yet supported."); | ||
| finalTokenizations.set(key, mergeAlignedTokenizations(independentTransitionResults)); | ||
| }); | ||
|
|
||
| return finalTokenizations; | ||
| } | ||
|
|
||
| /** | ||
| * Given two or more instances of ContextTokenization, this function will | ||
| * attempt to merge each token's SearchQuotientNodes as necessary to result in a | ||
| * single instance. | ||
| * | ||
| * An error will be thrown if the instances do not sufficiently converge to the | ||
| * same tokenization pattern. | ||
|
Comment on lines
+138
to
+139
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not seeing any error thrown in this function |
||
| * @param tokenizations | ||
| * @returns | ||
| */ | ||
| export function mergeAlignedTokenizations(tokenizations: ContextTokenization[]): ContextTokenization { | ||
| const finalizedTokens: ContextToken[] = []; | ||
|
|
||
| // Iterate through the token indices as long as at least one tokenization | ||
| // remains with that index. | ||
| // | ||
| // Assumption: all have at least one token. | ||
| for( | ||
| let i = 0; | ||
| tokenizations.length > 0; | ||
| // Two co-related iteration steps. Could be expressed as just one with ++i in functor conditional. | ||
| i++, tokenizations = tokenizations.filter((tokenization) => tokenization.tokens.length > i) | ||
| ) { | ||
| const bucket: ContextToken[] = []; | ||
|
|
||
| tokenizations.forEach((tokenization) => { | ||
| // Check for matches already within the bucket. | ||
| const token = tokenization.tokens[i]; | ||
|
|
||
| if(!bucket.find((t) => t.searchModule.isSameNode(token.searchModule))) { | ||
| bucket.push(token); | ||
| } | ||
| }); | ||
|
|
||
| if(bucket.length == 1) { | ||
| finalizedTokens.push(bucket[0]); | ||
| } else { | ||
| const constituentSpurs = bucket.flatMap((token) => { | ||
| const quotientNode = token.searchModule; | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't really understand all the terminology here: Even this function refers to a lot of concepts which I am struggling to put together into a coherent model: tokens (vs tokenizations), buckets, nodes, spurs, [search?]modules, clusters, quotients.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I probably should do a quality pass for consistency on many of the properties, class names, and such once all is said and done. For one, some things were put in place before #15161 was developed and put in place. I haven't enforced the new terms in all of the pre-existing code. (In regard to this specific case: quotient graphs are graphs based on "modules" of a more detailed graph, to visualize the higher-level patterns found within.) Not aiming to dismiss your concerns here whatsoever; it's been a journey working out the terms as work proceeds, and I hope there will be sufficient time to "polish things up" in this regard before release. |
||
|
|
||
| if(quotientNode instanceof SearchQuotientCluster) { | ||
| return quotientNode.parents; | ||
| } else { | ||
| return quotientNode; | ||
| } | ||
| }) | ||
| finalizedTokens.push(new ContextToken(new SearchQuotientCluster(constituentSpurs))); | ||
| } | ||
| } | ||
|
|
||
| return new ContextTokenization(finalizedTokens); | ||
| } | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not seeing any error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See
SearchQuotientCluster's constructor. This comment is regarding the error that will be generated within that constructor for such cases.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keyman/web/src/engine/predictive-text/worker-thread/src/main/correction/search-quotient-cluster.ts
Lines 73 to 80 in ce7f642