User:Ryuki4716/P15

bookSeqThemeD(bSTD)
bSTD traverses every Sequitur in nonNull AnteSeqD Glossa. bSTD analyzes every Token in each such Sequitur, accumulating any Tokens that are Themes, qualifying by having a Frequency under 500 (relatively inFrequent). L/R Hinges cannot qualify as Themes, tho no specific test disqualifies Margin Tokens since all Hinges are always extremely Frequent, always with Frequencies well over 500 (and so always fail the inFrequency requirement anyway). Certain other Tokens are also disqualified, such as QuestionMarks or any Tokens in ExcludeList, mostly Pair Symbols and Trash. If any such Theme Tokens have accumulated for a Sequitur, SeqThemeD is booked with the Sequitur as Lemma and the Themes as Glossa. No entries are booked for Null Glossa.

SeqThemeD(STD)
STD has for Lemma all the Sequiturs present in AnteSeqD Glossa, provided they have nonNull Glossa. An STD Glossa is a list of Theme Tokens. STD consults FreqDict for each Token's Frequency rating. STD consults AnteSeqD to find every Sequitur.

FreqDict
There is a FreqDict entry for each Token in the DKDTTL combination Dubliners/DasKapital Training Text, totalling 12279 Unique FreqDict Lemma, probably including significant Trash and Issues. compileFrequencyDict(CFD) disregards the ExcludeList for now. The CFD Gloss is the Frequency: how many times the Lemma Token occurs in DKDTTL. HighFreq Tokens include ('and','the','.',',') and all Hinges are amongst the Highest Frequency Tokens. There are very few such HighFreqs yet they occur very many times. Conversely, LowFreqs are extremely numerous, but rarely occur. Examples: ('Schnabel', 'GuavaPlace', '1923'). Uniques (Hapax) occur only 1 time in Text, yet there are extremely many FreqDict Unique entries. Only single Tokens are calculated for Frequency for now, tho CFD can calculate Frequency for sequences of N Tokens (such as for Collocations and Entities).

AnteThemeD(ATD)
At Extension time, ATD can provide Alt choices Thematic of the original Sequitur (in addition to matching Left Margins like CycloD). Thus SynTex (hopefully) displays greater Topic Coherence than when Extending only CycloD Alts, that only match by Left Margin, not by Theme. But many times no such ATD Alts exist, so then the SynTexGenerator relies on CycloD Extensions. When no such ATD Alts exist, it's usually since there were no Thematic Tokens in the Sequitur. Then fewer times there were no Thematic Alts in the TT.

ATD Lemma are the AnteSeqD Antes that are also present as Lemma in CycloD for each nonNull Glossa Sequitur in AnteSeqD. An ATD Glossa is a list of ThematicTokens representing the Sequitur the Themes were extracted from. ATD is used to select Alts that cohere Thematically with their Ante, since such Alts share Thematic Tokens with the original AnteSeqD Sequitur. Thus (hopefully) when Extended such Alts are more likely to preserve Topic Coherence. The original Ante/Sequitur sequence presumably had Topic Coherence since a human author concatenated it. SynTexGenerator repurposes that Topic Coherence by Extending an Alt Thematic with the Sequitur.

bookATD(bATD)
bATD reviews every Ante in CycloD. If it has a nonNull Sequitur, SeqThemeD is consulted to accumulate any Thematic Tokens. Then for each Alt for each CycloD entry, and for each Thematic Token, if a Theme Token is in an Alt, that Alt is accumulated. Finally AnteThemeD is booked with the CycloD Ante as Lemma and the Theme cumulate as Gloss. Null cumulate cases are not booked.

Generator3 (G3)[P15 4General]
G3 prefers the tightest Alt for Extension. From tightest to loosest: Dictionary         Matches LRTD               Left Margin Right Margin Theme LRD                Left Margin Right Margin AnteThemeD         Left Margin Theme CycloD             Left Margin