User:PerfektesChaos/js/WikiSyntaxTextMod/flow/link

WikiSyntaxTextMod → Syntax polishing → Step 4

With the fourth step of syntax polishing all links are processed. Possible links are detected by  and afterwards by   string search.

One goal is to adapt link targets, another aim is formatting of links in a common and readable manner, which can be detected easily by other scripts and bots.

Wikilink
If not explicitly mentioned, in this section the term “bracket” means square brackets.

Syntax correction

 * In certain unambiguous cases of wikilinks missing single brackets are added, superfluous brackets will be removed.
 * More than two opening  prevent link rendering and will be fixed (reduced to two brackets). With   in intended visible opening bracket might be provided.
 * If there are multiple adjacent pipe symbols within a wikilink instead of a single one they are reduced to one only.
 * If any other additional pipe symbol is found within link title the intended separation between link target and link title cannot be guessed and an error message is thrown only.
 * A line break (which is not permitted) within the bracketed region of meaningful extension is turned into a space character.

Wikilink by http
Sometimes an external URL is used, like

as well as  and protocol relative URL.

This is turned into wikilink format if possible.

Links by URL do not appear on WhatLinksHere and GlobalUsage.

Wikilink with scripting direction (left-right)
If directly before or after a wikilink target a (usually invisible) bidi character is present it will be discarded. Thie does not affect the functionality. On link or an old fashioned interlanguage into arabic language wikipedia the link target begins with  snd is not affected anyway.
 * Other zero width characters are kept in scripts where this is used but made invisible.

Wikipedia in other languages and major sister projects
Correct external links like

are not enclosed in &lt;ref> or moved as external link into other sections by this script.

Not only Wikipedia, but also other major sister projects (with a shortcut) linked by URL are detected and transformed into wikilink format.

It is a unique format used with a shortcut p (1 letter or  or  ): A leading colon ahead of project identifier is used by some authors but redundant and will be discarded.
 * p:Lemma – same language, other project type
 * p:lang:Lemma – other language, other project type
 * :lang:Lemma – other language, same project

The inverted order :lang:p:Lemma is quite rare and will be brought into usual sequence despite it works both ways.

URL as wikilink
This means something like

This brewage in URL-Escape/UTF-8 is made more pleasant.

As generally known this is born if authors copy the URL of the target page into wikilink. Underscores are replaced by spaces. Escape sequences are identified and replaced by UCS characters.

Wikilink on itself
This means a wikilink targetting to the current page (self):
 * self

will be unlinked, a differing link title
 * self

shall become

Often as
 * self section

to be replaced by
 * section

Within a  or   region link on itself is permitted and required and kept.

Simplify your wikilink
Titled wikilinks to other pages like

are simplified as

The same rules implemented in the parser are applied here avoiding changed appearance.

This goes especially for wich is just

Sometimes for the human reader the coinciding target word splits the matching link title at strange positions not expected for syllabification.

For titled links the resulting clickable (blue) part shall be the same as the bracketed title, merging

into

Pipe trick
In the first days of wikipedia the pipe trick has been invented: If a link target contains an expression in round parentheses …  or a comma, the part before will be displayed as link title if an empty link  title is given: The pipe symbol is followed by closing backets   immediately.

This was supposed to reduce typing. However, only a few authors are familiar with this notation, and the small pipe symbol might be overlooked easily. This script evaluates the construct by the same rules as the parser does and inserts the resulting and displayed link target explicitly.

It is less known even to authors swearing on the abbreviated format that the pipe trick does not work within “tag extensions” like  or   (and other delicacies won’t work there either). In this case the explicit title is producing the intended behaviour the first time.

Formatting
One of the general rules later text search may rely on:
 * There is no remaining space between  and link target or around pipe symbol   or ahead of.

Weblink (external link)
For recognition of URL only the following protocols are used:  and protocol relative. Other schemes are permitted in wikitext but quite rare.

If not explicitly mentioned, in this section the term “bracket” means square brackets.

Weblink correction
If an URL after opening bracket is immediately followed by  line break, that will be replaced by space, since the link won’t be displayed if spread over multiple lines. If double square brackets enclose an URL starting with protocol like  or   The brackets are reduced to single. This is unambiguous and a common mistake.
 * Weblink with
 * If anything else follows after link title but closing bracket is missing nothing will be changed, since it cannot be determined where the link title is intended to be terminated. The closing bracket might be absent until end of paragraph. An error message is displayed.
 * Weblink in double square brackets.
 * If within a URL pairs of square brackets are detected they will be escaped automatically if no doubt:
 * etc. result from TYPO3.
 * The entities …  are used rather than URL encoding  …  – this keeps the original notation of the web server. Not every server (especially applications of last century) supports percent decoding, nor is any server obliged to obey URL rules for its GET access. Therefore the functionality is not endangered, but an escaped URL would need to be tested. However, the MediaWiki software turns the encoding when displaying the page but this is not business ofthe underlying wikitext source.
 * An error message is always issued. If change appears to be unsafe nothing is modified.
 * If an URL is containing or joining special characters, a warning message is issued:
 * will break the link; they need to be escaped.
 * Pipe symbol  or   might be originated from wikisyntax with other intention: Separation of link title and italic or bold decoration when a space character got lost.
 * If an URL is terminated by a punctuation character  this is suspicious since without brackets the MediaWiki software assumes that this does not belong to the URL. Links without brackets should be enclosed in brackets and get an appropriate title to make it absolutely clear. If inside brackets they might have been copied by error until adjacent space.

Weblink formatting
Two of the general rules later text search may rely on:
 * There is no remaining space between  and   etc.
 * There is exactly one space between URL and linktitle.

URL formatting

 * In general a URL which is pointing to a domain only is terminated by slash . It also works without slash, since slash path is defaulted by HTTP, but this slash is the path of the “home” resource. Web servers return their own URL in this format. For search processes it might make more clear where the host part is terminated.
 * The domain name (host) is turned into lowercase as well as the protocol.

Weblink on wiki project
For weblinks with brackets related to wiki projects the following action is taken: On WMF URL without brackets which might be formatted as wikilink nothing is changed, but a warning will be issued.
 * If conversion into wikilink is possible this will be done.
 * Otherwise on many known WMF domains a protocol relative form is built. If certain subdomains are available by https only the protocol is changed into secure access.
 * The  domain is obsolete since fall 2011 and an equivalent URL will be created.

Modification of link target or environment
User defined modifications of wikilink, URL, or the adhering text segments are applied immediately to any detected link target.

If it is needed the link target will be protected against textual modification.

Remarks
&#91; German page &#93;