User:Ryuki4716/SynTex/Tokens

Tokens, Words, Punctuation in Synthetic Text(ST)
Text contains more than words. Punctuation is abundant, important and influences meaning. TOKENs are the words plus punctuation in a text. We manipulate TOKENs when generating Synthetic Text (ST). Throughout this discussion of Synthetic Text (ST), we operate on TOKENs, not just words.

In this example: '$1.50 If they can, you are in for a treat. Does' there are 11 WORDS:      ['$1.50', 'If','they','can','you','are','in','for','a','treat','Does'] 2 PUNCTUATION: [",","."] totalling 13 TOKENS. (Of course, different tokenizers may interpret '$1.50' differently). The definition of a token is pragmatic: a token is what a tokenizer considers to be a token.