User:PerfektesChaos/js/WikiSyntaxTextMod/flow/format

WikiSyntaxTextMod → Syntax polishing → Step 6

At sixth step the source readability for human beings is improved, and unique formatting makes constructs detectable for scripts, bots, human beings in dump evaluation or daily source code search.

Normally this would not affect page rendering.

Character Entities

 * Named entities for graphical characters according to HTML4 are replaced. They are confusing less technically experienced users and are available by tool bars nowadays. Who had no access to edit helps and entered a character by entity will get converted that automatically without loss of information into single Unicode character.
 * An exception is made for  and ML syntax escapes   and other invisible codes like.
 * Numerical entities hhhh  or  ddd  for graphical (visible) characters are replaced
 * with same exclusion list as named entities
 * and excluding wikisyntax escapes for
 * and no control codes
 * until ahead of x2800 = 10.240 decimal – those originated from european region including greek, russian and mathematial neighbours. Such fonts are rather widely distributed and low efforts needed to enter such a character.
 * On the other had it seems to be legal, that vietnamese, tamil or korean glyphs as numerical entities from x2800 = 10.240 decimal (braille) are kept readable and modifiable. It should be taken into account that authors have not installed such fonts and see ￭ only.
 * Different from this behaviour targetting at latin and letter based languages text sequences ([interlanguage] link or entire page) written in CJK (jp ko zh) within the range of such sequences entities are converted into ideograms, if recognized.
 * Since an entity may be protected by  or   or a comment might clarify that   is the real meaning of “ΤΑΧΕ”, entities are not replaced in first step but after identification of unchangeable areas.

Percent sign
Since 2007 between digit and percent sign the MediaWiki software inserts automatically  as non-breaking space. If their are older texts with  or by good faith authors inserted recently such entity or UCS that will be exchanged against ASCII space.

Line break

 * More than two line breaks out of protected ( etc.) are reduced to two line breaks.
 * Every  and every interlanguage (if not yet on wikidata) gets a line for its own.

Headline text separated by spaces
In many projects it is common that between equal signs of wikisyntax headline markup and the headline text one space is improving perceptibility. Depending on the project this will be standardized.

In picture galleries the following rules are applied:
 * If there was an indentation found, all lines will be indented by the maximum number of detected spaces.
 * The name space (mostly ) is not required any longer (79639) and will be discarded since it is redundant.
 * The name of the image file is decoded like a Wikilink.
 * If there is a user defined wikilink modification this will be executed.
 * If there is a necessity the name of the image file is protected against changes.

It is common practice to begin content immediately after opening  within text, not putting any spaces or even line breaks between. The same goes for the closing  that is following the content without any space or line break. This formatting is ensured.

That is invisible on the rendered page. Furthermore there are typographic rules how to join the resulting footnote sign with the surrounding text, the sentence or word. That is beyond syntax polishing and might be established with user defined rules.

within references is without effect depending on skin and style preferences or might lead to indecipherable letter size. Therefore  tags are deleted.

Within ………  blocks the  …  and   are put on a line for its own in order to make it easier distinguishing the single the references (especially when using cite templates).

Table attributes
For the entire table, table rows and leading cells attribute syntax is formatted similar to tags.

Tags, templates, links
This has been formatted and adapted in previous steps already:
 * tags
 * templates
 * links

Localized syntax elements in unique format
In non-English projects like German wikipedia there will be replaced according to project specific rules:
 * or localised variant – instead of   or   or
 * or localised variant – instead of   etc.
 * or localised variant – instead of   or others.
 * image (media) parameters downcased and localised standard variant
 * or localised variant – instead of.

More on keywords see localisation.

Examples of user defined modifications
Users may define on their own reponsibility their own cosmetics to extend the automatic polishing as described above.

HTML markup
checkwiki #26  checkwiki #38

When copying from external text sources sometimes authors put HTML markup ……  or  ……  into wikitext. This should be wikified. Automatically this might be taken from brief parts but another apostrophe »'«, line breaks, other HTML elements and protected regions show more difficult problems and need manual interpretation. Also  is rendered differently.

Exponents
The well known ANSI characters may be inserted easily:

However, for fragments and in music the  format is common and will be preferred optically; for measurement units like m² or cm³ or m/s² only the small exponent is meaningful in general.

With Unicode there are more superscript digits at 8304–8319 and algebraic signs as well as subscripts at 8320–8334 (H₂O, CO₂). However, currently it cannot be presumed that such codes are present in the font used by the reaader for rendering. Therefore formulas should be built by  or   as shown.

Wikisyntax bullets separated by spaces from content
At line beginning bullet characters like  and others should be separated by a space from content to make them easier recognizable: The second term is re-establishing table indentation, which would not be interpreted correctly otherwise. In general it is not recommended to format tables this way.

Sometimes a compact format of definition lists is used like Formally this is correct. For very brief terms and explanations this might be less questionable. However, human interpretation may be supported by by

Remarks
&#91; German page &#93;